The 350-line #define



  • A week or so ago, I heard a complaint coming from Loren on IRC.  In particular, he was angry that you couldn't use #ifdef in #defines; he wanted to do #ifdef x64 {do 64-bit stuff} #else {do 32-bit stuff} in an assembly #define.

    Roughly a day later, I noticed a large diff in libavcodec.  He had ported the MMX assembly for motion compensation to SSE2 and SSSE3, boosting H.264 decoding performance on Core 2s by 4%.  Pretty impressive.

    I look at it, and I see something that begins to set off one of those odd gut feelings of forboding.

    #define QPEL_H264_HL2_XMM(OPNAME, OP, MMX)

    It takes 3 arguments: the operation to do (qpel or hpel interpolation), the size of the operation (16, 8, or 4 pixels) , and what [b]instruction set[/b] to use.  Yes, it has the [b]instruction set[/b] as its argument. 

    Its a 350-line #define.

    Look upon these works, ye mighty, and despair!



  • @Dark Shikari said:

    A week or so ago, I heard a complaint coming from Loren on IRC.  In particular, he was angry that you couldn't use #ifdef in #defines; he wanted to do #ifdef x64 {do 64-bit stuff} #else {do 32-bit stuff} in an assembly #define.

    Roughly a day later, I noticed a large diff in libavcodec.  He had ported the MMX assembly for motion compensation to SSE2 and SSSE3, boosting H.264 decoding performance on Core 2s by 4%.  Pretty impressive.

    I look at it, and I see something that begins to set off one of those odd gut feelings of forboding.

    #define QPEL_H264_HL2_XMM(OPNAME, OP, MMX)

    It takes 3 arguments: the operation to do (qpel or hpel interpolation), the size of the operation (16, 8, or 4 pixels) , and what instruction set to use.  Yes, it has the instruction set as its argument. 

    Its a 350-line #define.

    Look upon these works, ye mighty, and despair!

     

     

    English please.  kk thx. 



  • Actually, it's QPEL_H264_XMM that has a 350-line define; QPEL_H264_HL2_XMM is only about 60 lines.  Of course, the 350-line one also calls 60-line one, to make things more fun.



  • Yeah, I realized that, my mistake in copy-pasting.



  •  I was under the impression that most modern C compilers inline functions when they can anyways (I.E, if the function is not recursive, why not inline it?), I am surprised that using a macro gave a 4% increase in speed.

     



  • @Jonathan Holland said:

     I was under the impression that most modern C compilers inline functions when they can anyways (I.E, if the function is not recursive, why not inline it?), I am surprised that using a macro gave a 4% increase in speed.

    I think the speedup came from the assembly optimization rather than the macro expansion.  The real WTF here is that we haven't, as a developer community, been brave enough to demand a reasonable and standard way of writing safe, coherent macros in C and C++.



  • Correct, the macro was just for convenience; the speedup was from days of work rewriting the motion compensation functions in SSE.

    The functions themselves got a 20-35% increase in speed over MMX.



  • @arty said:

    I think the speedup came from the assembly optimization rather than the macro expansion.  The real WTF here is that we haven't, as a developer community, been brave enough to demand a reasonable and standard way of writing safe, coherent macros in C and C++.
     

    How would you improve macro safety further than the C++ template system does? (fully accepting that templates solve a slightly different, but overlapping, set of problems to macros) 



  • @Jonathan Holland said:

     I was under the impression that most modern C compilers inline functions when they can anyways (I.E, if the function is not recursive, why not inline it?), I am surprised that using a macro gave a 4% increase in speed.

     

     

    It depends whether you are optimizing for speed, optimizing for size, or choosing not to optimize at all.  (Some implementations of optimization can be buggy.)  Obviously, a function defined in a C file that needs to be available other linked C files cannot be inlined, because inlining requires the body of the function to be available for all compilation units (C source files).  You have to do extra work to ensure a shared function is inlined: you would have to put the function body in a header file (which has its own subtle issues, some of which may be compiler-specific; e.g. what happens if you are writing portable code, and you have to support compilers which do not support inlining?). 

    IMO, it is not such a cut-and-dried issue. 



  •  Long story short, I believe a modern C compiler would only inline a given function if:

    1) It is defined as static (thus ensuring no other linked object file could ever call it).  From what I've seen, many C programmers don't even bother to use the static keyword for non-shared functions.

    2) You are optimizing for speed (e.g. gcc -0).  In GCC, I don't believe optimization is enabled by default.

    3) Any other requirements for inlining are met (e.g. it doesn't call itself recursively, as Jonathan said).

      



  • Actually, AFAIK, the assembly functions here are not inlined (the macro is used to define assembly functions--these functions, however, are not inlined).  This gave a significant speed boost when the change was made to not inline them because of their massive size and the number of times they were called.



  • The latest microsoft compilers can do a stunt that few other compilers can, they can inline functions that are in a different source file!



  • @henke37 said:

    The latest microsoft compilers can do a stunt that few other compilers can, they can inline functions that are in a different source file!
     

    <font size="-1">Your ideas are intriguing and I wish to subscribe to your newsletter.</font>



  • @Dark Shikari said:

    Its a 350-line #define.

    The mplayer code is crufy, ugly, kinda broken, and riddled with braindamaged non-solutions to the wrong problems. Not news. It's been that way pretty much forever.

    @Dark Shikari said:

    A week or so ago, I heard a complaint coming from Loren on IRC.  In particular, he was angry that you couldn't use #ifdef in #defines; he wanted to do #ifdef x64 {do 64-bit stuff} #else {do 32-bit stuff} in an assembly #define.

    Ironically enough, you never need to do that (there are better ways), and this kind of limited understanding goes a long way to explain why mplayer is such an awful mess. 



  • Hmmm, you want to be able to call QPEL_H264_HL2_XMM(qpel,16,English) ? What would the preprocessor output even look like? Shouldn't you be using Notepad for that?

    Oh, you mean Shikari's language? More English than yours, I'd say. ;-) 



  • @henke37 said:

    The latest microsoft compilers can do a stunt that few other compilers can, they can inline functions that are in a different source file!
     

     And the latest Cray supercomputer is so fast it can run an infinite loop in three seconds! 



  • @Dark Shikari said:

    Its a 350-line #define.

    Look upon these works, ye mighty, and despair!

     

    Awwwwww how cute <pats the little baby macro on the head> Yessums, awwww, who's a gweat big scawy macwo <pat pat> Youu are!  Yes you are!  <hides behind fingers> Peekaboo!




  • @Otterdam said:

    How would you improve macro safety further than the C++ template system does? (fully accepting that templates solve a slightly different, butoverlapping, set of problems to macros) 
     

    Even simple macros might be a bit unsafe like this:

    #define MAX(x,y) ((x)>(y)?(x):(y))

    It's fully parenthesized, but the classic example of passing it MAX(x++,y++) doesn't do what the user expects.  In C99, you can't use a function template to implement MAX over all types (like double and long long for example), and in C++, you can't use a function to yield a compile-time constant expression at all (for example, this)

    #define MIN_ALLOCATION_FOR_FILE_NAME MAX(PAGE_SIZE,MAX_PATH)

    char page_or_file_name[MIN_ALLOCATION_FOR_FILE_NAME]; 

    Macros can sometimes be used as endcaps for iteration over complex containers like this:

      FOR_EACH_FOO(container, iter) printf("%x\n", iter->name); END_FOR_EACH(iter);

    But nothing allows you to report a proper error here if END_FOR_EACH doesn't match.

    Note the way this template code could be improved by preprocessor-time iteration:

    http://www.boost.org/boost/bind/apply.hpp



  • @arty said:

    Even simple macros might be a bit unsafe like this:

    #define MAX(x,y) ((x)>(y)?(x):(y))

     

    Of course, if you are using gcc C extensions, you can write a "safe" statement expression like so:

    #define MAX(x, y) ({ int a = x, int b = y; b > a ? b : a })

    http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html 

    Statement expressions are pretty useful when you want to write a macro that more closely approximates an inline function.  And macros have the "advantage" over inline functions in that you can do "evil" things such as using sizeof() to determine the length of an array that's passed in as an argument.



  • Eh. The macro is being used to generate code. Isn't this less of a wtf than duplicating a 350-line function a bunch of times? 



  • @Isuwen said:

    Eh. The macro is being used to generate code. Isn't this less of a wtf than duplicating a 350-line function a bunch of times? 

    Oh, I agree fully; there's a perfectly good reason for the #define, and indeed code duplication would be more of a WTF in the sense that its a bad idea.

    However, even though something is a good idea and makes sense code-wise doesn't stop it from making your eyes jump out of their sockets when you see it.



  • @CodeSimian said:

    Statement expressions are pretty useful when you want to write a macro that more closely approximates an inline function.  And macros have the "advantage" over inline functions in that you can do "evil" things such as using sizeof() to determine the length of an array that's passed in as an argument.
     

    You make a good point.  Having such a thing in standard C would go a long way. 



  • @CodeSimian said:

    Of course, if you are using gcc C extensions, you can write a "safe" statement expression like so:

    #define MAX(x, y) ({ int a = x, int b = y; b > a ? b : a })

     

    Might as well go the whole hog: 

    #define MAX(x, y) ({ __typeof__(x) a = x, __typeof__(y) b = y; b > a ? b : a })

    Using typeof means that the macro will behave according to standard C rules for type-promotion.


Log in to reply