Even on a more basic level



  • A = A+B  in assembly means (roughly):

    load A into register_1

    load B into register_2

    move (contents of register_3 to somwhere safe)

    add (A, B, putting results in register_3)

    move (register_3 to register_1)


    A+=B translates roughly to:

    load A into register_1

    load B into register_2

    add (A, B, putting results in register_1)


    You save two lines, big deal, though I 've had (from somewhat questionable authorities) that move is a slow instruction on intel. 
    In either case I hear many compilers these days are smart enough to translate A = A+B to A+=B anyway.
     


  • If your compiler isn't smart enough to do that, you need a new compiler.
    And besides, you forgot the store instructions :)



  • This depends on the CPU.   On a RISC CPU, you are partially
    correct.  As the other guy noted, you missed the
    store.   So if A is not used anytime soon C = A + B might
    become the second version - overwriting A, which is already in memory
    someplace.



    On the 6502 your result ALWAYS goes in the accumulator
    register.   Generally you would load the accumulator with A,
    add B to that, storing in the accumulator, and then store the
    accumulator whereever.  The difference between C = A + B, and A =
    A+B is only in the store instruction.   I can't recall if
    there are instructions for use the X or Y register for addition. 
    I think you can use the X, but not the Y.  



    I haven't looked at x86 assembly, but judging from the specail rules I
    know of for registers, I wouldn't be surprized if it was much like the
    6502 in limiting which registers you can use for add.



    In any case, an a modern system it is a stupid compiler that cannot optimize this automaticly on the lowest setting.



  • In x86 assembly, you can use any registers...



    So



    add eax, ebx



    Would put the sum of eax and ebx in eax.



    You can also use a value from memory or specify an immediate value for the second operand, but not the first.



  • @DonMcNellis said:

    In x86 assembly, you can use any registers...



    So



    add eax, ebx



    Would put the sum of eax and ebx in eax.



    You can also use a value from memory or specify an immediate value for the second operand, but not the first.

    You can indeed use any register for any purpose, but using them for their "standard" purposes (EAX=accumulator, ECX=loop counter, etc) had some pipelining advantage at least up to P3... no idea if it still holds true, but it may.



  • If your compiler isn't smart enough to do that, you need a new compiler.
    And besides, you forgot the store instructions :)

    There are languages where these are not equivalent. C++ for example:

    template<class Numeric> Numeric sum1(Numeric a, Numeric b)
    {
      a = a + b;
      return a;
    }
    template<class Numeric> Numeric sum2(Numeric a, Numeric b)
    {
      a += b;
      return a;
    }
    template<class Numeric> Numeric sum3(Numeric a, Numeric b)
    {
      return a + b;
    }
    

    All of these are eqivalent when given a built-in numeric type, but totally different if given a class with overloaded operators.

    Stll, gcc optimizes that even in -O0 when given int. I wouldn't have expected gcc to notice it in case of templates.



  • > I can't recall if
    there are instructions for use the X or Y register for addition. 
    I think you can use the X, but not the Y. 



    no, you can't use the x- or y-register for adding on a 6502. You can
    only add the a-register by an immediate value or by a byte from the
    memory. (you can say that x and y are mostly used as indexes, while a
    is used for heavy math like adding :))



  • Most compilers will also optimise multiple adds, i.e.

    A = A + B + C + D + E

    ->

     mov ax [var_a]

    add ax [var_b]

    add ax [var_c]

    add ax [var_d]

    add ax [var_e]

    mov [var_a] ax

    mov is definitely not considered a slow instruction on IA32/x86.



  • Hey geys, you do realise that this only makes a difference when you're doing some calculation about half a billion times, or so, don't you? (And yeah, I've learned some basic assembler.)

    On the lowest level, you'd get this for A += B:

    mov ax, [A]

    add ax, [B]

    mov [A], ax

    However, you can also do this in reverse order:

    mov ax, [B]

    add ax, [A]

    mov [A], ax

    at a level this low, it just doesn't matter in which order the calculations are done since the system will need to use some register for the calculations to begin with. So at the lowest level, no difference...

    The problem however is at slightly higher levels where the compiler is generating code that will e.g. check for numerical overflows. something like 255 + 255 would never fit in a datatype of 1 byte so there needs to be some bounds checking. And only in such cases the you might get some better optimization results. But still, it doesn't change the fact that for any calculation there's always a need for one register for the result. Basically, any compiler will just load the first value, add the following values and then return the result.

    The same can be used at the higher levels. When you use an object-oriented language that will allow you to overload operators then any use of those operators on an object will just create a new object. Thus when doing something like A = B + A, the compiler would start by creating a new object C and assign the value of B to C. It would then add the value of A to C. Finally A will be freed and point to C. The same happens when you say A += B, except in a slightly different order. It will create C, first load it with value A and then add value B before freeing A and then letting A point to C.

    Don't forget, objects are basically just nothing more than pointers.

    However, you could try to implement a faster solution for A += B by just adding B to A directly. That would indeed save the memory allocation for C and you wouldn't need to free the old A. But not all compilers will build code like that and just do it similar as A = A + B. On the other hand, some compilers will be smart enough to recognise that A = A + B can be optimized to A += B and will optimize the final output. It all basically depends on how the compiler is optimizing object-oriented code that supports operator overloading. And of course how people implement their own operator overloading code.

    Still, at the lowest level we're still talking about registers and data in memory. And at this lowest level it's still done by first loading the first value in a register, then adding the next values and finally return the value of the register to the data location. At this point there's really no speed gain, unless you have your data stored in registers to begin with.

    The speed gain comes from the overloaded operators, where objects need to be created to hold the result value while another object needs to be freed because it's not required anymore. Then again, working at higher levels will slow you down anyways. And if you really need to do billions of calculations in a very small amount of time, you shouldn't use things as complex as objects. (Yet if you use .NET then you have no other choice...)



  • 6502??  My first computer was an Apple II+  I fondly remember
    the X and Y accumulators.  Are people still using 6502s?



  • Yeah, and Z80s. My 95LX palmtop has a Z80.


Log in to reply