Configuration files are hard



  • Our group's most senior engineer is the smartest guy in the company. Unfortunately, that doesn't mean he can write good code. This is how his application reads its configuration file:

    /* Must re-load config since setup could have changed */
    config = fopen (filename, "r");     /* Check for config file */
    if (config == NULL)
    {
        printf ("\nConfiguration file not found, exiting...\n");
        getchar();
        exit (0);
    }
    fscanf (config, "%s %d %d %d %d %x %x %d %d %d %d %d %s %d %d %d %d %d %d %d %d %x %d %x %x %x %d %d %d",port,&baud,&footype,&ftype,&retrycount,&mask,&rfl,&xyzsize,xyztype,&passcount,&verbo,&portnum,station,&keyenab,&abc_delay,&footype2,&ftype2,&prdydelay,&ppagedelay,&rst_tim,&spdabc,&(unsigned short) lptport,&silence, &abc_read, &resetable, &customid, &obit, &disptype, &hardtype);
    port[5] = '\0';
    fclose (config);
    

    First, I'd like to point out the variables footype and ftype. The 'f' in ftype stands for 'foo'. No one who has ever dealt with this code knows the difference between footype and ftype. Same goes for footype2 and ftype2.

    Since the config file is one long text string of unnamed values, it is too easy to screw up if you try to edit it by hand. So the engineer wrote another program to make it easier to change the values. You can probably guess that there's a massive corresponding fprintf in there.

    His main application actually runs in multiple processes simultaneously (basically he launches his exe multiple times). So you have multiple processes that need this configuration, and a separate process that writes the configuration while the other processes are still running. This has never worked very well: processes running with different configurations, inconsistent configurations, race conditions, etc.

    The engineer's "solution" to this problem was to re-read the configuration file every time he needs any value from the file, hence the comment "Must re-load config since setup could have changed". This block of code was copy-pasted almost two dozen times throughout his code. Need to add a new setting to the configuration file? Search-and-replace every instance of the massive fscanf and hope you don't miss one. At least all of his code is in one file (several thousand lines long).

    The one time I got involved in this mess, I suggested to the engineer that he should at least put all of this in a function so he wouldn't have to search-and-replace. He thought that was a good idea, but he didn't really get what I was saying. Now this block of code only appears 7 times.


  • I survived the hour long Uno hand

    Configuration files are Software Engineering is hard

    FTFY

    (also, holy fuck shit.)


  • Java Dev

    Count the wtfs...

    I don't like sscanf. I prefer using our strsplit() helper function or the SkipSpaces/SkipNonSpaces macros when dealing with whitespace-delimited files.



  • At least he didn't use strtok.


  • Java Dev

    strtok() is best avoided, as using strtok_r() is little extra effort. I do prefer the strsep() semantics, but that doesn't skip multiple adjacent separators which can be nice sometimes.

    These, of course, have the additional downside of writing to your input buffer so you may need to make a copy (like if you need to show the whole line in an error message later).

    So many ways to solve the same problem.



  • I've always thought it was kind of annoying that there are so many similar-but-slightly-different formats for configuration files. I didn't expect to read about someone who has never seen any of them.


  • Discourse touched me in a no-no place

    @NedFodder said:

    ```c
    fscanf (config, "%s %d %d %d %d %x %x %d %d %d %d %d %s %d %d %d %d %d %d %d %d %x %d %x %x %x %d %d %d",port,&baud,&footype,&ftype,&retrycount,&mask,&rfl,&xyzsize,xyztype,&passcount,&verbo,&portnum,station,&keyenab,&abc_delay,&footype2,&ftype2,&prdydelay,&ppagedelay,&rst_tim,&spdabc,&(unsigned short) lptport,&silence, &abc_read, &resetable, &customid, &obit, &disptype, &hardtype);

    
    http://cdn.meme.am/instances/500x/43316944.jpg
    
    It's not even checking the result of the `fscanf`…


  • I'd rather see him using the should-not-develop-new-things-on-it-now GetPrivateProfile* APIs.

    Btw, seems these * PrivateProfile * APIs are in interesting state. The MSDN documentation just say "These functions are provided only for compatibility with 16-bit versions of Windows. New applications should use the registry." but the functions themselves are neither in deprecated or obsoluted status.

    Btw again, why don't he write code that'll post a "config reload" application message to the child processes to signal the configuration change?



  • @NedFodder said:

    Search-and-replace every instance of the massive fscanf and hope you don't miss one.
    ....

        "%s %d %d %d %d %x %x %d %d %d %d %d %s %d %d %d %d %d %d %d %d %x %d %x %x %x %d %d %d"
    ```</blockquote>
    
    At least it's an easy string to search for...


  • @NedFodder said:

    &(unsigned short) lptport

    I'm struggling to figure out how this compiles.



  • @jnz said:

    @NedFodder said:
    &(unsigned short) lptport

    I'm struggling to figure out how this compiles.

    Maybe it is just a clever way to extract the lower 16 bits of lptport?

    Edit: I had to ponder a few seconds about whether it is more probable that the code is running on an architecture with little endian or big endian integer representation.



  • @Watson said:

    @NedFodder said:
    Search-and-replace every instance of the massive fscanf and hope you don't miss one.
    ....

        "%s %d %d %d %d %x %x %d %d %d %d %d %s %d %d %d %d %d %d %d %d %x %d %x %x %x %d %d %d"
    ```</blockquote>
    
    At least it's an easy string to search for...</blockquote>
    
    If only the first 25 or so settings are needed, why parse the entire file? 
    
    

    fscanf (config, "%s %d %d %d %d %x %x %d %d %d %d %d %s %d %d %d %d %d %d %d %d %x %d %x %x",port,&baud,&footype,&ftype,&retrycount,&mask,&rfl,&xyzsize,xyztype,&passcount,&verbo,&portnum,station,&keyenab,&abc_delay,&footype2,&ftype2,&prdydelay,&ppagedelay,&rst_tim,&spdabc,&(unsigned short) lptport,&silence, &abc_read, &resetable);

    is considerably shorter, after all.
    
    @Dragnslcr <a href="/t/via-quote/54562/6">said</a>:<blockquote>I've always thought it was kind of annoying that there are so many similar-but-slightly-different formats for configuration files.</blockquote>
    
    Right. Everything should be comma-separated XMSON.


  • @jnz said:

    @NedFodder said:
    &(unsigned short) lptport

    I'm struggling to figure out how this compiles.

    Huh, beats me. It's defined as unsigned int. The engineer is using Visual C++ 6 (that was the latest version when this project started). I'm not a compiler expert, but I'm guessing he's in UB territory and just getting lucky.



  • @PWolff said:

    @jnz said:
    @NedFodder said:
    &(unsigned short) lptport

    I'm struggling to figure out how this compiles.

    Maybe it is just a clever way to extract the lower 16 bits of lptport?

    Still curious, so I did a quick search: he casts it to unsigned short everywhere it's used, so he only ever uses the lower 16 bits. So why is it declared unsigned int?



  • I think that violates the strict aliasing rule and is therefore undefined behavior (assuming it's compiled as C++ and not C, dunno what the C rule is).

    Why is it declared as unsigned int? Probably dumbs, maybe an alignment thing?



  • @jmp said:

    assuming it's compiled as C++ and not C

    I'll check when I get back to work tomorrow, but I think it's compiled as C++.

    @jmp said:

    Probably dumbs

    Yes.

    @jmp said:

    maybe an alignment thing?

    No.

    It's just used as the port number to _inp() and _outp(). No reason for any alignment shenanigans.



  • @cheong said:

    Btw again, why don't he write code that'll post a "config reload" application message to the child processes to signal the configuration change?

    He has difficulty with the concept of functions, and you want him to do inter-process communication? Also, they're not child processes:

    @NedFodder said:

    His main application actually runs in multiple processes simultaneously (basically he launches his exe multiple times).

    Imagine him double-clicking the exe as many times as required.



  • @dkf said:

    It's not even checking the result of the fscanf…

    Eh, this is C programming. What could go wrong?



  • @jnz said:

    I'm struggling to figure out how this compiles.

    When you're making soup, you want to add a few dashes of salt and herbs. Casting in C is much the same, add a few here and there to impart flavour, and it makes the code much more memorable for maintainers.


  • Considered Harmful

    Reminds me of my first Perl project in the industry. That was for a company that built PowerPC accelerator cards for Amigas. To bring the system up and be able to load a decent 68k-yo-PPC JIT emulator, they needed something that worked ffrom 256k of flash, never mind the speed. One of the hardware engineers had started to write a program that was supposed to generate the PPC code, in C. He was a brilliant guy for hardware but his code was a wasteland of printfs
    printf("mov.l r28, r%d\n", src1);
    printf("addeo. r28, r%d, r%d\n", src2, src2);
    printf("dibfswcr[0]. 0,1,-1,%s\n", x);
    printf("mov.l r%d, r28\n", src2);

    I convinced him to let me replace the mess with a very small Perl script that soon grew into a monstrosity of itself as new requirements turned up, though not half as bad as the equivalent C would have been.

    [0] I suppose there was some instruction called "do inscrutable bit fiddling stuff with condition register". Or something like that.



  • @tar said:

    @jnz said:
    I'm struggling to figure out how this compiles.

    When you're making soup, you want to add a few dashes of salt and herbs. Casting in C is much the same, add a few here and there to impart flavour, and it makes the code much more memorable for maintainers.

    I want a language that has a compiler that will warn me when I do this:

    // using this brace style because I can annoy more people this way.
    
    typedef struct
    {
        int x, y, z;
                   } coord;
    
    void bar(coord pos);
    
    void foo(coord min, coord max)
    {
        for (int x = min.x; x <= max.x; x++)
        {
            for (int y = min.x /* whoops */; y <= max.y; y++)
            {
                for (int z = min.z; z <= max.z; z++)
                {
                    bar(coord{x, y, z});
                                                     }
                                                              }
                                             }
               }
    

    Until that exists, no language will be good enough.



  • @ben_lubar said:

    I want a language that has a compiler that will warn me when I do this:

    While not a "compiler", PVS-Studio may be able to catch this error. See, for example http://www.viva64.com/en/d/0126/



  • @NedFodder said:

    He thought that was a good idea, but he didn't really get what I was saying. Now this block of code only appears 7 times.

    I think I've met this guy!
    @flabdablet said:

    @Ragnax said:
    there are idiots aplenty that will gladly regale their own 'expert driver' experiences and advise you to hold the clutch disengaged while you brake and decelerate to your target speed, so that you can easily re-engage into your target gear in one go. VERY - BAD - IDEA. You lose most of the control over the car that way.

    I met one such idiot and convinced him that what he was doing was wrong. So he proceeded to hold the clutch disengaged through the entire process of moving the stick down from 5th through 4th, 3rd and 2nd, braking to a stop, and shifting into 1st. He then left his foot on the clutch pedal for the entire two minutes it took the lights to change.

    At that point I stopped trying to save his wallet from himself.



  • @NedFodder said:

    fscanf (config, "%s %d %d %d %d %x %x %d %d %d %d %d %s %d %d %d %d %d %d %d %d %x %d %x %x %x %d %d %d",port,&baud,&footype,&ftype,&retrycount,&mask,&rfl,&xyzsize,xyztype,&passcount,&verbo,&portnum,station,&keyenab,&abc_delay,&footype2,&ftype2,&prdydelay,&ppagedelay,&rst_tim,&spdabc,&(unsigned short) lptport,&silence, &abc_read, &resetable, &customid, &obit, &disptype, &hardtype);

    So I matched up the format descriptors with the rest of the parameters, and here's what we have:

    %s port
    %d &baud
    %d &footype
    %d &ftype
    %d &retrycount
    %x &mask
    %x &rfl
    %d &xyzsize
    %d xyztype
    %d &passcount
    %d &verbo
    %d &portnum
    %s station
    %d &keyenab
    %d &abc_delay
    %d &footype2
    %d &ftype2
    %d &prdydelay
    %d &ppagedelay
    %d &rst_tim
    %d &spdabc
    %x &(unsigned short) lptport
    %d &silence
    %x &abc_read
    %x &resetable
    %x &customid
    %d &obit
    %d &disptype
    %d &hardtype
    

    %d xyztype is probably an anonymization typo and not worth further comment. But %x &(unsigned short) lptport tells a story.

    @NedFodder said:

    Still curious, so I did a quick search: he casts it to unsigned short everywhere it's used, so he only ever uses the lower 16 bits. So why is it declared unsigned int?

    When πŸ‘΄ originally wrote this code, he declared it unsigned short. But there was weird breakage every time the config file got read. So he's remembered that sscanf() has no way of checking the types of its varargs, meaning it has to work out the sizes of its destinations from the format descriptors, and that %x tells sscanf() to parse and read a hexadecimal int, which is a 32 bit type; but lptport is only 16 bits, so whatever followed it in RAM was getting trashed. So he had to change the declaration of lptport to unsigned int to stop that happening. But the rest of his code contained implicit optimizations that assumed lptport was indeed an unsigned short, so he did s/lptport/(unsigned short) lptport/g across the entire code base, which also ended up in the sscanf() parameter list, though sscanf() itself doesn't care about that; it will still be parsing a 32 bit int and dumping four bytes wherever &lptport points. The only reason this actually works is because this project is running on a little-endian machine, which means that the address of the least significant half of lptport is the same as the address of the whole thing.

    Of course, he could just have changed his sscanf() format descriptor from %x to %hx but that would have involved reading the man page and nobody at πŸ‘΄'s pay grade can risk getting caught doing that. Or perhaps this is yet another POS embedded-systems "standard" C library that isn't actually ANSI conformant and doesn't support the h modifier.



  • @NedFodder said:

    @jnz said:
    @NedFodder said:
    &(unsigned short) lptport

    I'm struggling to figure out how this compiles.

    Huh, beats me. It's defined as unsigned int. The engineer is using Visual C++ 6 (that was the latest version when this project started). I'm not a compiler expert, but I'm guessing he's in UB territory and just getting lucky.

    No he's not in UB territory, he's squarely in "should be rejected by the compiler" territory. Quoth the standard: "a cast does not yield a l-value".

    He could use

    (unsigned short *) &lptport
    instead.



  • @flabdablet said:

    %d xyztype is probably an anonymization typo

    Correct, sorry about that. The rest of your analysis sounds plausible:

    πŸ’» Program randomly crashes.
    πŸ‘΄ Just cast something!
    πŸ’» Program randomly crashes.
    πŸ‘΄ CAST ALL THE THINGS!!!



  • Cast your screwdrivers to hammers and all your problem will be solved because you can then treat everything else as nails.


  • FoxDev

    @cheong said:

    Cast your screwdrivers to hammers

    The odd thing is, there are some types of screw that are meant to be hammered in; the screw head is for removing them. Also, some electric screwdrivers have a hammer function for the same reason.



  • Yup, so we can also have hexadecimal value into byte or integer. πŸ˜›

    Btw, when I need a hammer while I have electric screwdriver, sometime I'll just hammer the nail with the handle or the rear side, depending on where the wire is located. :doing_it_wrong:


  • FoxDev

    @cheong said:

    sometime I'll just hammer the nail with the handle or the rear side

    Yes, you are indeed
    @cheong said:
    :doing_it_wrong:

    πŸ˜›



  • @NedFodder said:

    πŸ‘΄ CAST ALL THE THINGS!!!

    I cast you OUT! Unclean vararg! In the name of Our Lord Dennis Ritchie: it is He who commands you!
    https://www.youtube.com/watch?v=bSxuXQCEC7M


  • BINNED

    @ben_lubar said:

    // using this brace style because I can annoy more people this way.

    Did it work?



  • @NedFodder said:

    πŸ’» Program randomly crashes.
    πŸ‘΄ CAST ALL THE THINGS!!!

    Which is remarkably like responding to a kitchen stove fire by pouring gasoline onto it. Uh, I guess you 'solved' the only-the-kitchen-being-on-fire problem.



  • #ifdef ONETRUEBRACESTYLE
    #define FINALBRACE {
    #define INITIALBRACE
    #else
    #define FINALBRACE 
    #define INITIALBRACE {
    #endif
    

    There. Now it's configurable:

    int main() FINALBRACE
    INITIALBRACE
        printf("Nice!");
    }
    

  • BINNED

    @tar said:

    There. Now it's configurable:

    Congratulations. You are now more annoying than @ben_lubar. Have a πŸͺ.



  • You forgot

    #define CLOSINGBRACE }
    

    but then again I guess you might have wanted to stop short of going full Bournegol.

    Never go full Bournegol.



  • As silly as that is, it's simple text substitution, and with sensible names. I expected way worse.


  • Discourse touched me in a no-no place

    I think it would be so much easier if you just did this:

    #define BODY(body) { body }
    
    int main() BODY(printf("Nice!");)
    

    Sorted.



  • @tar said:

    ```
    #ifdef ONETRUEBRACESTYLE
    #define FINALBRACE {
    #define INITIALBRACE
    #else
    #define FINALBRACE
    #define INITIALBRACE {
    #endif

    There. Now it's configurable:
    

    int main() FINALBRACE
    INITIALBRACE
    printf("Nice!");
    }

    
    That's just stupid. What's wrong with:
    

    #ifdef ONETRUEBRACESTYLE
    #define OPENINGBRACE {
    #else
    #define OPENINGBRACE
    {
    #endif

    
    Then you can just do:
    

    int main() OPENINGBRACE
    printf("Even better!");
    }



  • I'm afraid that won't work: the \ token tells the preprocessor to concatenate this and the next line togetherβ€”it eats the carriage return :<gah>(



  • The tokenizer eats the carriage return anyway, so why does it matter?



  • :congratulationsthatisthejoke.xlsx:



  • And what a hilarious joke it is.



  • I never claimed it was hilarious.


  • β™Ώ (Parody)

    That would've been hilarious if you had.



  • Sorry to disappoint.


Log in to reply