Unstructured optimization



  •  My first days as a C coder at a small company, I noticed that every non trivial function in the code base had a high number of parameters.

    12 parameters were a bare minimum, 20 params very frequent and I found a monster function with 37 (or was it 47) parameters !

    When I asked my boss about this, he very seriously explained it as an optimization : "You see, if you use a structure, the program needs to take its address and add an offset each time you access a field. Thus by avoiding structures, we save a lot of time."

    He also gave this useful tip : "Since the parameters are stored on a stack,  try to always specify them in the order of use in the function, to speed up access".

     



  •  YARCI, they are more prevalent than people would like to admit

     keeping the parameters on the stack means that you need to take the the address of the top of the stack and add an offest, costing you the same amount of time



  • Technically he's not exactly wrong. The first parameters are passed in registers and the rest on the stack. So you can do computations with the values in the registers already, while waiting for the stack values to be loaded from ram in registers. Also you cannot pass struct parameters to functions, which means you do need to pass by address. This is the reason why some numeric code is faster when written in FORTRAN where you can use pass-by-value semantics for functions taking e.g. large arrays as parameters, but the compiler can deduce you're not modifying them so it does a pass by reference, but unlike in C, it doesn't need to care about aliasing. However, nowadays you can use the "restrict" keyword to mark your pointers as not-aliased. You should show it to your boss and educate him how making every pointer restricted will improve runtime and memory use compared to the pass-by-value functions you use now.



  • @ratchet freak said:

     YARCI, they are more prevalent than people would like to admit

     keeping the parameters on the stack means that you need to take the the address of the top of the stack and add an offest, costing you the same amount of time

    YARCI? http://acronymsandslang.com/meaning-of/YARCI.html

    What does "Youth as Resources of Central Indiana" have anything to do with this?



  • @ratchet freak said:

    YARCI, they are more prevalent than people would like to admit

    You Are Really Crud Intensive?

    Young Aardvarks Really Chase Insects?


  • Discourse touched me in a no-no place

    @gvh said:

    12 parameters were a bare minimum
    WTF!@gvh said:
    20 params very frequent
    WTF!@gvh said:
    monster function with 37
    I think there's some kind of repeating pattern here.



  • Solution: write a benchmark demonstrating both "optimization" steps and see if they actually do anything.



  • Since you're probably using the same sets of parameters every time, the obvious "C"-like solution is to #define the list of parameters so you won't have to repeat them every time.



  • @dtech said:

    Solution: write a benchmark demonstrating both "optimization" steps and see if they actually do anything.

    here you go. Note: It's been years since I last touched C and I never had any real experience with it.

    In the few runs I did the differences were in the milliseconds (with 1 millions executions). The struct was generally faster



  • The printfs and their general interaction with OS state will clobber any variance from the actual function call. Do some simple arithmetic computation instead and see if you can find a difference.

    If you're on x86-64, the calling convention means that the struct is essentially passed as a parameter in both the stack and in registers.



  • @gvh said:

     My first days as a C coder at a small company, I noticed that every non trivial function in the code base had a high number of parameters.

    12 parameters were a bare minimum, 20 params very frequent and I found a monster function with 37 (or was it 47) parameters !

    When I asked my boss about this, he very seriously explained it as an optimization : "You see, if you use a structure, the program needs to take its address and add an offset each time you access a field. Thus by avoiding structures, we save a lot of time."

    He also gave this useful tip : "Since the parameters are stored on a stack,  try to always specify them in the order of use in the function, to speed up access".

     

    OK, I've had my "performance first" phase as a programmer, but many years have passed and I've learned that maintainable code is always the best strategy. You can always win a few nanoseconds somewhere else or with more hardware.



  • @blakeyrat said:

    @ratchet freak said:
    YARCI, they are more prevalent than people would like to admit

    You Are Really Crud Intensive?

    Young Aardvarks Really Chase Insects?

    Yet Another Really Crappy Implementation?

    Yet Another Really Crappy Idea?

    Yet Another Retarded C Implementation?



  • @darkmattar said:

    @blakeyrat said:
    @ratchet freak said:
    YARCI, they are more prevalent than people would like to admit

    You Are Really Crud Intensive?

    Young Aardvarks Really Chase Insects?

    Yet Another Really Crappy Implementation?

    Yet Another Really Crappy Idea?

    Yet Another Retarded C Implementation?

    "Yoda?" Asked Really Cool Indonesians


  • Trolleybus Mechanic

    @Ben L. said:

    @darkmattar said:
    @blakeyrat said:
    @ratchet freak said:
    YARCI, they are more prevalent than people would like to admit

    You Are Really Crud Intensive?

    Young Aardvarks Really Chase Insects?

    Yet Another Really Crappy Implementation?

    Yet Another Really Crappy Idea?

    Yet Another Retarded C Implementation?

    "Yoda?" Really Cool Indonesians, Asked

     

    YTFY

     


  • BINNED

    @Mo6eB said:

    The first parameters are passed in registers and the rest on the stack. So you can do computations with the values in the registers already, while waiting for the stack. [...] However, nowadays you can use the "restrict" keyword to mark your pointers as not-aliased. You should show it to your boss and educate him how making every pointer restricted will improve runtime and memory use compared to the pass-by-value functions you use now.

     

    While all of this may be true, I highly doubt that is what the OP's boss meant.

    Since the parameters are stored on a stack,  try to always specify them in the order of use in the function, to speed up access

    Reading this, I get the feeling he thinks of the stack as a strict LIFO queue* instead of random access, or some similar madness. Maybe he thinks a computer is a pushdown automaton.
    (*If you'd use a second stack to store away the elements that you had to pop out of the stack to get to a specific item, you'd even be Turing complete again, iirc. Who knows what he had in mind, but I'm pretty sure it wasn't fastcall calling conventions)

     


  • Discourse touched me in a no-no place

    @Mo6eB said:

    Technically he's not exactly wrong. The first parameters are passed in registers and the rest on the stack. So you can do computations with the values in the registers already, while waiting for the stack values to be loaded from ram in registers. Also you cannot pass struct parameters to functions, which means you do need to pass by address. This is the reason why some numeric code is faster when written in FORTRAN where you can use pass-by-value semantics for functions taking e.g. large arrays as parameters, but the compiler can deduce you're not modifying them so it does a pass by reference, but unlike in C, it doesn't need to care about aliasing. However, nowadays you can use the "restrict" keyword to mark your pointers as not-aliased. You should show it to your boss and educate him how making every pointer restricted will improve runtime and memory use compared to the pass-by-value functions you use now.

    Cut that out! Ignoring for the moment the register-vs-stack issue, because not all memory models work that way, nor will all compilers (x86-64 has 16 GP registers, not 32, so you have AT LEAST 18 you can use for parameters if you wanted!) this shows a fundamental misunderstanding of C. Structure field offsets are calculated at compile-time!



  • @topspin said:

    Since the parameters are stored on a stack,  try to always specify them in the order of use in the function, to speed up access

    Reading this, I get the feeling he thinks of the stack as a strict LIFO queue* instead of random access, or some similar madness. Maybe he thinks a computer is a pushdown automaton.

    $deity help us, they're making managers out of old FORTH programmers!

     



  • Yuppie Asshole Refactors Code Incorrectly?



  • @Ben L. said:

    @darkmattar said:
    @blakeyrat said:
    You Are Really Crud Intensive?

    Young Aardvarks Really Chase Insects?

    Yet Another Really Crappy Implementation?

    Yet Another Really Crappy Idea?

    Yet Another Retarded C Implementation?

    "Yoda?" Asked Really Cool Indonesians

    Yet Another Retarded, Clueless Idiot@ratchet freak said:
    YARCI, they are more prevalent than people members of the designated population would like to admit
    FTFY People who are not members of the designated class readily admit how prevalent they are.



  • @Ben L. said:

    @darkmattar said:
    @blakeyrat said:
    @ratchet freak said:
    YARCI, they are more prevalent than people would like to admit

    You Are Really Crud Intensive?

    Young Aardvarks Really Chase Insects?

    Yet Another Really Crappy Implementation?

    Yet Another Really Crappy Idea?

    Yet Another Retarded C Implementation?

    "Yoda?" Asked Really Cool Indonesians

    I think it's actually textspeak for "you see". But what would I know? I'm so out of touch I still put a nose on my emoticons ;o)



  • @RTapeLoadingError said:

    @Ben L. said:
    @darkmattar said:
    @blakeyrat said:
    @ratchet freak said:
    YARCI, they are more prevalent than people would like to admit

    You Are Really Crud Intensive?

    Young Aardvarks Really Chase Insects?

    Yet Another Really Crappy Implementation?

    Yet Another Really Crappy Idea?

    Yet Another Retarded C Implementation?

    "Yoda?" Asked Really Cool Indonesians

    I think it's actually textspeak for "you see". But what would I know? I'm so out of touch I still put a nose on my emoticons ;o)

    They obviously misspelled weapon.



  • @Mo6eB said:

    Technically he's not exactly wrong. The first parameters are passed in registers and the rest on the stack. So you can do computations with the values in the registers already, while waiting for the stack values to be loaded from ram in registers.

    Except for two things. Registers are precious on most platforms. So unless you use those parameters relatively immediately, there's a good chance you'll have to move the parameters out of the registers to free those registers for the computations you need to do before you need the parameters. Second, modern computers don't have to load recently-accessed data from RAM, they can be loaded from L1 cache, which is nearly free. In sum, code is more likely to prefer its parameters to be in L1 cache than in registers, unless the code is trivial.

     



  • @dtech said:

    @dtech said:
    Solution: write a benchmark demonstrating both "optimization" steps and see if they actually do anything.

    here you go. Note: It's been years since I last touched C and I never had any real experience with it.

    In the few runs I did the differences were in the milliseconds (with 1 millions executions). The struct was generally faster

    FTFY

    A sample run with 20billion iterations and compiled with -O2 in gcc:
    41.680000 - no parameters (control)
    81.810000 - struct byref ordered
    83.140000 - struct byref unordered
    115.290000 - struct byval ordered
    121.620000 - struct byval unordered
    71.760000 - separate params ordered
    71.110000 - separate params unordered

    I'm not sure I trust that the ordered/unordered differences are above the noise floor, but I think I've got the compiler to do a semi-useful comparison between the byref/byval/params.

    Conclusion: there can be a difference. It's negligible.

    I, for one, am going to continue passing structs (because they're pretty important for writing maintainable code) by reference (because passing by value doesn't help).


  • @anonymous234 said:

    Since you're probably using the same sets of parameters every time, the obvious "C"-like solution is to #define the list of parameters so you won't have to repeat them every time.
     

    Well, the other optimization pretty much guaranteed a nearly random parameters ordering.

    I fondly remember two variants of a "convert some file to a different format" that had nearly the same 20+ parameters except that source and destination were swapped.

    Thanks $DEITY, I quickly moved to another job.

     



  •  "Premature optimization is the root of all evil (or at least most of it) in programming." -- Donald Knuth, Communications of the ACM 17 (12) in 1974.



  • @ubersoldat said:

    OK, I've had my "performance first" phase as a programmer, but many years have passed and I've learned that maintainable code is always the best strategy. You can always win a few nanoseconds somewhere else or with more hardware.

    I'd tend to agree. Usually applications these days are fast enough not to require extensive optimization, if the developer knows what he/she is doing. We're not in the eighties anymore where every cycle counted. Writing maintainable code makes it easier to identify and fix any bottlenecks. In my yet-short career I've found that the biggest optimization opportunities can be found in the database layer: crappy queries written by inept developers, missing indices, even choosing the wrong storage engine for a given table.



  • @toon said:

    @ubersoldat said:
    OK, I've had my "performance first" phase as a programmer, but many years have passed and I've learned that maintainable code is always the best strategy. You can always win a few nanoseconds somewhere else or with more hardware.

    I'd tend to agree. Usually applications these days are fast enough not to require extensive optimization, if the developer knows what he/she is doing. We're not in the eighties anymore where every cycle counted. Writing maintainable code makes it easier to identify and fix any bottlenecks. In my yet-short career I've found that the biggest optimization opportunities can be found in the database layer: crappy queries written by inept developers, missing indices, even choosing the wrong storage engine for a given table.

    I have a script that goes through a 4GB XML file and adds the data to a database. I was able to make it run at least 10 times faster by adding two lines of code:

    ActiveRecord::Base.transaction do
      # original code here
    end

    The reasoning: by default, Ruby on Rails wraps each insert in a transaction. Top-level transactions force SQLite (the default RoR database backend) to create a file, write some data to it, read that data back, write data to another file, and delete the file that was just created.

    Turns out doing that for every element in an xml file that contains the character '<' 111382620 times can take a while.



  • @Ben L. said:

    I was able to make it run at least 10 times faster by adding two lines of code:

    ActiveRecord::Base.transaction do
      # original code here
    end

    Here, I can make it run 1000 times faster by adding one line:

    # stop using Ruby, you fucking ponce


  • @morbiuswilters said:

    @Ben L. said:
    I was able to make it run at least 10 times faster by adding two lines of code:

    ActiveRecord::Base.transaction do
      # original code here
    end

    Here, I can make it run 1000 times faster by adding one line:

    # stop using Ruby, you fucking ponce

    Tried it, but there didn't seem to be any change in speed.



  • @Ben L. said:

    I have a script that goes through a 4GB XML file and adds the data to a database. I was able to make it run at least 10 times faster by adding two lines of code:

    ActiveRecord::Base.transaction do
      # original code here
    end

    The reasoning: by default, Ruby on Rails wraps each insert in a transaction. Top-level transactions force SQLite (the default RoR database backend) to create a file, write some data to it, read that data back, write data to another file, and delete the file that was just created.

    I would've used activerecord-import, myself, because bulk imports are faster still. But I probably wouldn't dump a 4GB XML file into a sqlite database through ActiveRecord in the first place. I sense a number of WTFs lurking nearby.



  • @bstorer said:

    @Ben L. said:
    I have a script that goes through a 4GB XML file and adds the data to a database. I was able to make it run at least 10 times faster by adding two lines of code:

    ActiveRecord::Base.transaction do
      # original code here
    end

    The reasoning: by default, Ruby on Rails wraps each insert in a transaction. Top-level transactions force SQLite (the default RoR database backend) to create a file, write some data to it, read that data back, write data to another file, and delete the file that was just created.

    I would've used activerecord-import, myself, because bulk imports are faster still. But I probably wouldn't dump a 4GB XML file into a sqlite database through ActiveRecord in the first place. I sense a number of WTFs lurking nearby.

    Here, tear it apart.



  • @Ben L. said:

    @bstorer said:
    @Ben L. said:
    I have a script that goes through a 4GB XML file and adds the data to a database. I was able to make it run at least 10 times faster by adding two lines of code:

    ActiveRecord::Base.transaction do
      # original code here
    end

    The reasoning: by default, Ruby on Rails wraps each insert in a transaction. Top-level transactions force SQLite (the default RoR database backend) to create a file, write some data to it, read that data back, write data to another file, and delete the file that was just created.

    I would've used activerecord-import, myself, because bulk imports are faster still. But I probably wouldn't dump a 4GB XML file into a sqlite database through ActiveRecord in the first place. I sense a number of WTFs lurking nearby.

    Here, tear it apart.

    Oh, it's only a Dwarf Fortress dump. No WTFs here!



  • @morbiuswilters said:

    @Ben L. said:
    @bstorer said:
    @Ben L. said:
    I have a script that goes through a 4GB XML file and adds the data to a database. I was able to make it run at least 10 times faster by adding two lines of code:

    ActiveRecord::Base.transaction do
      # original code here
    end

    The reasoning: by default, Ruby on Rails wraps each insert in a transaction. Top-level transactions force SQLite (the default RoR database backend) to create a file, write some data to it, read that data back, write data to another file, and delete the file that was just created.

    I would've used activerecord-import, myself, because bulk imports are faster still. But I probably wouldn't dump a 4GB XML file into a sqlite database through ActiveRecord in the first place. I sense a number of WTFs lurking nearby.

    Here, tear it apart.

    Oh, it's only a Dwarf Fortress dump. No WTFs here!

    Nah, just autism.



  • @Ben L. said:

    Nah, just autism.

    Autistic kids rock



  • @DrakeSmith said:

    @Ben L. said:
    Nah, just autism.
    Autistic kids rock
     

    ...back and forth?



  • @dhromed said:

    @DrakeSmith said:

    @Ben L. said:
    Nah, just autism.

    Autistic kids rock
     

    ...back and forth?

    +win



  • @powerlord said:

     "Premature optimization is the root of all evil (or at least most of it) in programming." -- Donald Knuth, Communications of the ACM 17 (12) in 1974.

    Except this isn't even premature optimisation, but an extremely poor attempt at optimising something that doesn't even need to be optimised. The insignificant amount of time "saved" by passing a bajillion params instead of a struct is almost certainly being negated by something else somewhere in the codebase that would benefit from actual optimisation.

    Of course that will never happen, because from the OP's description it sounds like this company is a cargo-cult code factory where the rules are/were handed down on stone tablets by The One Original Programmer and cannot be questioned, much less changed. So code gets written based on arbitrary rules and nothing is ever profiled, and eventually the codebase and the application collapse into an enormous pile of unmaintainable shit.



  • @Ben L. said:

    I have a script that goes through a 4GB XML file
     

    You'd be better off getting rid of the 3.9GB of superfluous markup and just working with the 100MB of actual data in the first place.



  • @too_many_usernames said:

    @Ben L. said:

    I have a script that goes through a 4GB XML file
     

    You'd be better off getting rid of the 3.9GB of superfluous markup and just working with the 100MB of actual data in the first place.

    Brian Krzanich just called to say "Cool it, wildman, if you know what's good for ya'.."



  • @too_many_usernames said:

    @Ben L. said:

    I have a script that goes through a 4GB XML file
     

    You'd be better off getting rid of the 3.9GB of superfluous markup and just working with the 100MB of actual data in the first place.


    Ok, I'll just use a script to remove the superfluous markup and--

    Oh I should probably write a script to remove the superfluous markup fi--

    But before that, I should remove the marku--

    StackOverflowError



  • @Ben L. said:

    @too_many_usernames said:

    @Ben L. said:

    I have a script that goes through a 4GB XML file
     

    You'd be better off getting rid of the 3.9GB of superfluous markup and just working with the 100MB of actual data in the first place.


    Ok, I'll just use a script to remove the superfluous markup and--

    Oh I should probably write a script to remove the superfluous markup fi--

    But before that, I should remove the marku--

    StackOverflowError

    $ xzcat db/input.xml.xz | wc -c; xzcat db/input.xml.xz | sed -e 's/<[^>]\+>//g' | wc -c
    4065789703
    775316349

    4065789703 is 3.78656173 GiB. I was off by a factor of 1.0563673.

    775316349 is 739.399289 MiB. You were off by a factor of 7.39399289.

    Also, for extra fun: 3.9 GiB + 100 MiB is 3.99765625 GiB.



  • @Ben L. said:

    $ xzcat db/input.xml.xz | wc -c; xzcat db/input.xml.xz | sed -e 's/<[^>]+>//g' | wc -c
    4065789703
    775316349

    4065789703 is 3.78656173 GiB. I was off by a factor of 1.0563673.

    775316349 is 739.399289 MiB. You were off by a factor of 7.39399289.

    Also, for extra fun: 3.9 GiB + 100 MiB is 3.99765625 GiB.

    You forgot to strip out the leading/trailing whitespace and empty lines after you stripped the XML tags. Doing that, I get 531116872 characters. You owe too_many_usernames an apology.

    Also, lern2bash, n00b:

    arsenic ~ # xzcat input.xml.xz | tee >(sed -e 's/<[^>]\+>//g' | wc -c) >(wc -c) >/dev/null
    4065789703
    775316349


  •  why is it called "dev/null" instead of, like, "sys/null"



  • @dhromed said:

     why is it called "dev/null" instead of, like, "sys/null"


    Silence, heretic. The mighty Filesystem Hierarchy Standard was carved in a tablet of stone by the Elders of Unix over 3000 years ago and is not to be questioned.



  • @dhromed said:

     why is it called "dev/null" instead of, like, "sys/null"

    Because the kernel stores the information on the devices in a linked list, which is, of course, null-terminated.


  • Discourse touched me in a no-no place

    @dhromed said:

    why is it called "dev/null" instead of, like, "sys/null"

    You want devices to be mulched in with other system things like the password database?



  • null is a device?


  • ♿ (Parody)

    @dhromed said:

    null is a device?

    Just like your mom is a device. It's used like one in that whatever gets pointed its way is ignored. Commonly used to pipe the output of noisy commands whose noise you're not interested in seeing.



  • @dhromed said:

    null is a device?

    According to SQL, null is not not a device.



  •  dev/''


Log in to reply