It's negativ!



  • Just found this in a small Java app I have to maintain. Not a WTF but a fun way to figure out if an Integer is negativ.

    int temp = value1 - value2;
    if((temp & (1<<31)) != 0) { // if negativ
          temp = 0; // set temp to 0
    }

    How much faster can it be than if(temp < 0)?



  • If the bitmask 1<<31 is calculated at compile time, then it should be a little slower, otherwise it would be much slower (especially on architectures that only allow shifting one bit at a time)
    The code would have to generate pseudo-assembly like this

    AND regA, 1<<31
    CMP regA, 0          ; this may be unneccesary if the AND sets the ZERO flag, making both versions the same length
    JNE label
    LOAD regB, 0         ; regB = temp
    label:

    A straight if( temp < 0) would be

    CMP reg1, 0
    JG label
    LOAD regB, 0
    label:

    And here's what GCC actually produces (sans optimization)

    void shift()
    {
        int temp = 123;
        if( temp & (1<<31) != 0)
        {
            temp = 0;
        }
    }
    shift:
        pushl %ebp
        movl %esp,%ebp
        subl $24,%esp
        movl $123,-4(%ebp)
        movl -4(%ebp),%eax
        andl $1,%eax
        testl %eax,%eax
        je .L8
        movl $0,-4(%ebp)
    .L8:
    .L7:
        leave
        ret

    void compare()
    {
        int temp = 123;
        if( temp < 0 )
        {
            temp = 0;
        }
    }
    compare:
        pushl %ebp
        movl %esp,%ebp
        subl $24,%esp
        movl $123,-4(%ebp)
        cmpl $0,-4(%ebp)
        jge .L10
        movl $0,-4(%ebp)
    .L10:
    .L9:
        leave
        ret

    The less-than/greater-than operators are the clear winners here :)


  • Thank you. I don't understand assembly at all but I'll just belive you and change it to temp<0 :-)

    I have read that bitwise operations are faster than "normal" ones. That's why I assumed that the whole ANDing and Shifting is faster. In what for cases are bitwise operations faster (In Java in this case)? Just curious.



  • Well, there's a few more WTF's in the code:

     

    >>> int temp = value1 - value2;

     

     if the difference between value1 and value2 is larger than 2^31, the difference will overflow a 32-bit integer.

    for example, value1 = -2000000000; value2 = 2000000000;  the difference is 4 billion, which will wrap around to a negative number;

    then checking the 31-st bit will *kinda* work if "int" is always a 32-bit integer variable.


    > How much faster can it be than if(temp < 0)?

     better yet, compare the values directly:   if( value1 < value2 ) { diff = 0 } else {diff = value1 - value2 }

     

    That will work correctly whatever the values or types of the variables.

     

    Actually, seeing what's really going on, even better :     diff = Max( value1 - value2, 0 );


     

     

     

     

     



  • I suspect that value1 and value2 were unsigned ints, stuffed into Java ints (happens in Corba and JNDI, amongst others).  The signed subtraction was done on these values.  At that point, the programmer's head exploded -- trust me, do a lot of work with unsigneds and it might happen to you.



  • @kiriran said:

    Thank you. I don't understand assembly at all but I'll just belive you and change it to temp<0 :-)

    I have read that bitwise operations are faster than "normal" ones. That's why I assumed that the whole ANDing and Shifting is faster. In what for cases are bitwise operations faster (In Java in this case)? Just curious.


    Any optimizing compiler worth its salt will be able to recognize most of the simple, sane cases (multiplying by 2, bitshifting by one place, yay -- or on x86 in particular, XOR AX, AX instead of MOV AX,0). Unless you need to do hyper-optimizations in a loop that you run billions of times during your program, please! Preserve your sanity (and everyone else's!) by just writing the version that makes sense! Look for the BIG things you can do to speed things up. (And to figure out just how to do that, use a profiler.)

    "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -- Donald Knuth



  • @kiriran said:

    Thank you. I don't understand assembly at all but I'll just belive you and change it to temp<0 :-)

    I have read that bitwise operations are faster than "normal" ones. That's why I assumed that the whole ANDing and Shifting is faster. In what for cases are bitwise operations faster (In Java in this case)? Just curious.

     Bitwise operations are fast, because they're trivial to implement in hardware, but there's only so much you can do with them. One place where they can be used to approximate other instructions is when multiplying or dividing numbers.

    Shifting an integer value left (but not floating point!) is equal to multiplying it by two. A right-shift is the same as dividing the value by two.
    On a machine where there's no mul/div instruction in hardware, a multiplication would have to be done as a loop.

    // calculate a = 10b;
    a = b;
    for(int i = 9; i > 0; i--)
        a += b;

    That means that there is one assignment operation and 9 additions.

    Using bit-shifts, we can multiply the number by 2, 4, 8, 16, ...

    a = b << 3;     // a = 8b
    a += b;           // a = 8b + b = 9b

    That obviously saves a lot of add instructions. With fast multiplication in hardware, though, there's really no need to do this on modern CPU's though.

     
    I just realized another problem with the original code: 1<<32 assumes that integers are 32 bit long, and that the 32nd bit holds the sign. That means the code will break on any architecture where int's are shorter or longer (i.e. 64bit).



  •  Nandurius wrote the following post at 02-15-2007 3:09 PM:
    void shift()
    {
        int temp = 123;
        if( temp & (1<<31) != 0)
        {
            temp = 0;
        }
    }
    shift:
        pushl %ebp
        movl %esp,%ebp
        subl $24,%esp
        movl $123,-4(%ebp)
        movl -4(%ebp),%eax
        andl $1,%eax
        testl %eax,%eax
        je .L8
        movl $0,-4(%ebp)
    .L8:
    .L7:
        leave
        ret

    A quick glance at the generated code is a hint that you didn't test what you think you did. I don't see any constant in there that's close to 2 to the 31st. Your example fails to consider the precedence rules in C - comparisions have a higher precedence than logical and, so your statement is in effect:

      if( temp & ((1<<31) != 0))

    or

    if (temp & (0x80000000 != 0)) => if(temp & 1), since True in C is 1.

    If you aren't sure of the precedence rules, use parentheses. If you are sure of the precedence rules, ask yourself these two questions before omitting the parentheses:

    Will the maintainer of my code *know* that I understood the precedence rules when I wrote this, or will he/she have to spend time seeing if it needed parens and I omitted them?

    Will the maintainer of the code know the precedence rules themselves or will he/she fail to understand what my expression does?

    If there is any doubt about any of the three, use parens. Almost any expression involving bit shifting and comparision operators needs to be fully parenthesised.

     



  • Good catch. Here's the correctly parenthesized version:

     shift2:
        pushl %ebp
        movl %esp,%ebp
        subl $24,%esp
        movl $123,-4(%ebp)
        movl -4(%ebp),%eax
        andl $-2147483648,%eax
        testl %eax,%eax
        je .L10
        movl $0,-4(%ebp)
    .L10:
    .L9:
        leave
        ret

     


  • Considered Harmful

    From the looks of things, the original code had the correct parentheses.  The one who reduced it to assembly didn't copy it correctly.

     Edit:  He beat me to the correction.



  • @kiriran said:

    I have read that bitwise operations are faster than "normal" ones. That's why I assumed that the whole ANDing and Shifting is faster.

    This used to be true, but isn't any more.

    In a modern CPU, integer addition, integer subtraction, bit-shifting, and bitwise operations all take one clock cycle.  Integer comparisons are usually implemented as subtraction without storing the result, so those take one clock cycle.  Integer multiplication and division still take multiple clock cycles, but the compiler is a better judge of when to convert them to bitwise operations than you are.



  • @kiriran said:

    Thank you. I don't understand assembly at all but I'll just belive you and change it to temp<0 :-)

    I have read that bitwise operations are faster than "normal" ones. That's why I assumed that the whole ANDing and Shifting is faster. In what for cases are bitwise operations faster (In Java in this case)? Just curious.

     First, we are discussing java. In java, everythig is run under a virtual machine. The performance of a bit of code like that is JVM implementation dependant. Because performances are very similar for such elementary operations, you are better to assume they are the same an write clean code instead of maybe useless tricks like that. If you are at a point where you need to gain cycle in a comparaison operation, you're better leaving java to go assembly :)

     Also, it's generally cheaper to buy faster CPUs than to pay programmers for such optmization. Am sure when you read that code you spend about 10 minutes searching the web about performance and comparaisons and bitwise operation. This hack cost your company 10 minute for each developper that see it. Not to mention the fact you need to read it twice to check for errors.
     

    Last but not least, there are areas where you can really optimize in java. Out of my mind, i think about the synchronized StringBuffer (very wide spread in java code), triggered each time you have  String s = "Hello " + world; vs the unsynchronized StringBuilder (which, since it's introduction by sun, replace stringbuffer in string concatenation)

     

     



  • @Ancient_Hacker said:

     if the difference between value1 and value2 is larger than 2^31, the difference will overflow a 32-bit integer.

    @Ancient_Hacker said:

      

    Actually, seeing what's really going on, even better :     diff = Max( value1 - value2, 0 );

     

    WTF?



  • All that rigamarole of code, if you study it, and ignore the bit-twiddly aspects, it does this:  it sets temp to whatever is bigger, zero or value1 - value2. 

     

    The max() function exists in most languages and makes a dandy appearance here.

     

    sorry-- I meant "temp", not "diff" 

     

     



  • Thanks for all the replies!

    Haven't thougth of that it could overflow the Integer. Ok it's very unlike that this happens (the Integers represent the "n items in stock" (value2) and "n items that should be in stock" (value1)) but it's a good point.

    Another question I have while we are on the topic: If java runs on 64bit machine, is an int 64bit long? Or is an int in java always 32bit long?

    @tchize
    I started to learn assembly but there is so much theory, I haven't wrote a line of assembly yet ;-)


    Am sure when you read that code you spend about 10 minutes searching
    the web about performance and comparaisons and bitwise operation.
     

    I didn't search the web about that, but really wondered why he did that. I'm a junior programmer at my company so I asked the senior who wrote the code why he did that and told him temp<0 is more readable than his aproach. Performance over readability was his answer and moved on.



  • @Nandurius said:

    Bitwise operations are fast, because they're trivial to implement in hardware

    But, only if your hardware happens to implement fast bitwise operations.

    Why do I feel compelled to mention this? Well, those fast bit-shift-as-multiply methods? They aren't faster on the Pentium 4. Why not? Because the bloody chip doesn't have a barrel shifter - it implements shift operations using its multiply units.

    One of the many utterly retarded reasons why AMD kicked Intel's arse onto the other side of their face for so long. I can only imagine that everybody in Intel's chip design division went collectively insane as soon as design work began on the P4. (The post-P4 chips have undone much of the stupidity, and once again have a barrel shifter like every other high-performance chip on the planet) 



  • @jes said:

    A quick glance at the generated code is a hint that you didn't test what you think you did. I don't see any constant in there that's close to 2 to the 31st. Your example fails to consider the precedence rules in C - comparisions have a higher precedence than logical and

    & is a bitwise and. && is a logical and.

    0x0F & 0x10 = 0x1F
    0x0F && 0x00 = 0x00
    0x0F && 0x10 = TRUE (usually (unsigned int)(-1) or 1)

    I think this code is OK, because bitwise ands have a greater priority than relational operators (at least they do in C).



  • @asuffield said:

    @Nandurius said:

    Bitwise operations are fast, because they're trivial to implement in hardware

    But, only if your hardware happens to implement fast bitwise operations.

    Why do I feel compelled to mention this? Well, those fast bit-shift-as-multiply methods? They aren't faster on the Pentium 4. Why not? Because the bloody chip doesn't have a barrel shifter - it implements shift operations using its multiply units.

    One of the many utterly retarded reasons why AMD kicked Intel's arse onto the other side of their face for so long. I can only imagine that everybody in Intel's chip design division went collectively insane as soon as design work began on the P4. (The post-P4 chips have undone much of the stupidity, and once again have a barrel shifter like every other high-performance chip on the planet) 


    The Pentium IV was a decent CPU - if a bit long in the pipeline - until the bean-counters got to it.  The original design called for, among other things, a barrel shifter, two floating-point execution units, and decently large L1 and L2 caches.  But using a 180nm process, it would have taken up too much silicon and been too expensive to produce.  Hence, the quirks in the chip design: 
    * A dispatch unit capable of dispatching more instructions per cycle than the CPU has execution units
    * Only one general-purpose FPU (but a SIMD unit that can be used as an auxiliary FPU)
    * No barrel shifter - the barrel shifter was part of the address-generation unit
    * No dedicated address-generation unit - address generation is done by the ALUs



  • @Tweenk said:

    & is a bitwise and. && is a logical and.

    0x0F & 0x10 = 0x1F


    No, 0x0F & 0x10 = 0x00.



  • @Carnildo said:

    No, 0x0F & 0x10 = 0x00.

    Whoops, confused & with |. Well, I wanted to make something that wouldn't require hex maths to figure out, and I ended up with an incorrect example :)



  • @Carnildo said:

    Integer multiplication and division still take multiple clock cycles, but the compiler is a better judge of when to convert them to bitwise operations than you are.

    Not true, for division anyway. Most processors have an arithmetic right shift, but that will round down instead of toward zero (-5 / 2 = -2, but -5 >> 1 = -3). You can change division by a power of two into a shift if you know the number will never be negative, but the compiler will most likely not be able to figure that out, so if you want speed you have to write the shift manually.

    @asuffield said:

    Well, those fast bit-shift-as-multiply methods? They aren't faster on the Pentium 4. Why not? Because the bloody chip doesn't have a barrel shifter - it implements shift operations using its multiply units.

    I have a P4, and SHL/SHR are definitely still faster than MUL.



  • Tweenk wrote the following post at 02-16-2007 12:55 AM:

    </p><blockquote><p>jes:
    
    A quick glance at the generated code is a hint that you didn't test what you think you did. I don't see any constant in there that's close to 2 to the 31st. Your example fails to consider the precedence rules in C - comparisions have a higher precedence than logical and
    

    & is a bitwise and. && is a logical and.

    0x0F & 0x10 = 0x1F

    0x0F && 0x00 = 0x00

    0x0F && 0x10 = TRUE (usually (unsigned int)(-1) or 1)

    I think this code is OK, because bitwise ands have a greater priority than relational operators (at least they do in C).


     

    I think you need to learn the precedence table for C. 

    Here's a summary for part of the  precedence rules:

    comparisions have a higher precedence than logical and/or

    comparisions have a higher precedence than bitwise and/or/xor

    for completeness:

    bitwise operations have a higher precedence than logical and/or

     
    This is why I made my point about parenthesising expressions unless you actually _know_ the rules for C and you can be totally confident that the next person working on your code knows the precedence rules for C and knows that you know the precedence rules for C

    You apparently do not know the rules for C. Hint - my comment that the generated assembler lacked any immediate constant 2^31. To me, that was a WTF, which caused me to actually look at the C code and go "WTF?" again.

    Unless you think gcc doesn't know the precedence rules for C, you'd do well to look them up rather than stating that you think the code is OK. It isn't.

    The java code was OK, because it was parenthesised correctly, the C code is not an equivalent.

    Given responses like these and the ones in a previous thread "One Step Forward...", it appears that either there are a lot of readers who think they know C but actually don't
     



  • @kiriran said:

    Another question I have while we are on the topic: If java runs on 64bit machine, is an int 64bit long? Or is an int in java always 32bit long?

    @tchize
    I started to learn assembly but there is so much theory, I haven't wrote a line of assembly yet ;-)


    Am sure when you read that code you spend about 10 minutes searching
    the web about performance and comparaisons and bitwise operation.

    What ever architecture the jvm runs on, the java should behave the same. The size of variables and all rules a jvm must comply on are specified in java language specifications. For the int size, look at http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1 

     

    As for his answer, i'm happy to see you ask somewhere else than this senior programmer. He probably is senior from the age of C on 8086 and never occured to him that this kind of optimization is in contradiction to using java. If he really need speed so much, i wonder why he use java as, being an interpreted language, it is a bit slower than native language :) 



  • @tchize said:

    @kiriran said:

    Another question I have while we are on the topic: If java runs on 64bit machine, is an int 64bit long? Or is an int in java always 32bit long?

    @tchize
    I started to learn assembly but there is so much theory, I haven't wrote a line of assembly yet ;-)


    Am sure when you read that code you spend about 10 minutes searching
    the web about performance and comparaisons and bitwise operation.

    What ever architecture the jvm runs on, the java should behave the same. The size of variables and all rules a jvm must comply on are specified in java language specifications. For the int size, look at http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1 

     

    As for his answer, i'm happy to see you ask somewhere else than this senior programmer. He probably is senior from the age of C on 8086 and never occured to him that this kind of optimization is in contradiction to using java. If he really need speed so much, i wonder why he use java as, being an interpreted language, it is a bit slower than native language :) 


    Repeat after me: Java's not interpreted.


    It's compiled to bytecode and the bytecode is either interpreted Just-In-Time compiled to the local machine's architecture.

    (I mean, you could maybe make a case for saying that "Java is interpreted" meaning "Java bytecode is interpreted" but most people don't look at the bytecode and say "That's Java!" - they look at the language where they write about things like public static void main(String args[]) so this is misleading at best... and you certainly can't make run-time modifications of code like you might with a really-interpreted language like QBasic or such... and the speed of interpreting bytecode is much much better than the speed of interpreting source...)



  • @fennec said:

    Repeat after me: Java's not interpreted.


    It's compiled to bytecode and the bytecode is either interpreted Just-In-Time compiled to the local machine's architecture

     

    Java is not interpreted :)

    Sorry for misleading terms. Of course java is compiled, it's the bytecode in the jvm that gets either interpreted, either Just-In-Time compiled depending on various rules specific of the jvm. My point was that such king of ""optimization"" is pointless in an environment where execution time of a bit of code is highly dependent on surrounding architecture :)
     


Log in to reply