A problem with big numbers


  • area_pol

    In one of the projects I'm currently involved with, there is a simple templating engine. It involves fetching data from the db, resources, etc, and patching that together to form data easy to consume for some other module.

    Placeholder for data in a template looks something like this:

    "$db_id_0,db_id_1,resource_id_0,resource_id_1#"
    

    I was asked to look at engine code and make sure it works correctly. While browsing the source I stumble upon...

    foreach(var placeholder in placeholders)
    {
        var firstId = placeholder.Substring(2, 2);
        int id = Int32.Parse(firstId);
        //fetch data from db, patch into resulting data
        //[...] similar stuff for remainder of the placeholder
    }
    

    With worst expectations I ask author of above snippet

    • why does it assume that id has two digits?
    • when I tested this algo every id I saw had two digits.

    He's a senior developer. I have no comment.



  • When you run out of ids, you just have to reuse the earlier ones, I don't see the problem.


  • Discourse touched me in a no-no place

    @MrL said:

    He's a senior developer.

    A senior developer having a senior moment…


  • area_pol

    @cartman82 said:

    When you run out of ids, you just have to reuse the earlier ones, I don't see the problem.

    And it happens automatically!

    @dkf said:

    A senior developer having a senior moment…

    Browsing other parts that he programmed I'm coming to conclusion that those moments start when he touches the keyboard and last until he leaves his desk.



  • And there's no way that we'll ever need more than 100 ~10 at a time!



  • @MrL said:

    He's a senior developer.

    Clearly, he's learned the value of "Undefined behavior."



  • @boomzilla said:

    Clearly, he's learned the value of "Undefined behavior."

    I can't see any undefined behavior. If I understand the code correctly, it strips all but two first digits from ID.



  • @Gaska said:

    I can't see any undefined behavior. If I understand the code correctly, it strips all but two first digits from ID.

    I'm assuming, of course, that there's no mention of datatype / size in the spec for the ids.


  • area_pol

    @boomzilla said:

    I'm assuming, of course, that there's no mention of datatype / size in the spec for the ids.

    Spec? Never heard of it.



  • @boomzilla said:

    I'm assuming, of course, that there's no mention of datatype / size in the spec for the ids.

    I was assuming undefined behavior as defined in ANSI C specification.



  • @Gaska said:

    I was assuming undefined behavior as defined in ANSI C specification.

    Is it time to stock up on nose demon spray again?



  • @DCRoss said:

    Is it time to stock up on nose demon spray again?

    No, because as I said, there's no UB of this kind in that code.



  • @Gaska said:

    I was assuming undefined behavior as defined in ANSI C specification.

    I can't imagine why. There's no way that shit compiles.



  • @boomzilla said:

    I can't imagine why. There's no way that shit compiles.

    Care to elaborate? Looks perfectly fine to my untrained eye.

    I used ANSI C definition because it's general enough to cover all technologies that exist - even though no one designs anything with explicit undefined behaviors anymore.


  • SockDev

    @Gaska said:

    I was assuming undefined behavior as defined in ANSI C specification.

    is that the one that literally says in the spec that a compiler upon encountering undefined behavior is allowed to implement the behavior in any manner it wishes to, up to and including by making demons fly out of the programmer's nostrils?

    or is that just an urban legend?



  • @Gaska said:

    Care to elaborate? Looks perfectly fine to my untrained eye.

    foreach(var placeholder in placeholders)
    

    Looks like no ANSI C I've ever seen. Unless it's from some preprocessor maniac who wants to pretend he's writing C#.



  • @accalia said:

    is that the one that literally says in the spec that a compiler upon encountering undefined behavior is allowed to implement the behavior in any manner it wishes to, up to and including by making demons fly out of the programmer's nostrils?

    Up to the comma, it's literally that.

    @boomzilla said:

    Looks like no ANSI C I've ever seen.

    I've never said anything about using ANSI C. Or any language, for that matter. I was just using the definition of undefined behavior that's contained in the ANSI C spec for convenience.


  • SockDev

    @Gaska said:

    Up to the comma, it's literally that.

    so the nasal demons are just an urban legend.

    that's sad.... but not unexpected.



  • @accalia said:

    so the nasal demons are just an urban legend.

    Not as much urban legend as a common joke. No one takes it seriously, but technically speaking, it would be valid behavior for conforming C compiler.


  • BINNED

    http://www.catb.org/jargon/html/N/nasal-demons.html claims to have the original source of the meme



  • @Gaska said:

    I used ANSI C definition because it's general enough to cover all technologies that exist - even though no one designs anything with explicit undefined behaviors anymore.

    (Emphasis added.)

    Perhaps not, but I've seen plenty of code that was clearly designed by people who either really didn't know about undefined behaviour or had no fear of it. You know, like:
    ==> Treating random patches of memory as if they were C++ objects by creating C-like char[] buffers and then ObjectType &refvar = (ObjectType &)c_like_buffer_variable;.(1) With careful set-up of the class ObjectType, you will usually get away with this, except that the fine colleagues(2) who did this also liked:
    ==> Passing C++-like objects by value to varargs parameters. This one broke when the company upgraded the Solaris C++ compiler from rev 6 to rev 8, at a time when the up-to-date version was rev 11.

    (1) Actually, the buffers were not C-like. They were FORTRAN-like. In 2004, the code-base included (or so I was told) around 60 million lines of FORTRAN, that is, twice as many lines of code as there were in Windows 2000.
    (2) This is a euphemism. The said colleagues were prime fuckwits, evidently.



  • @Gaska said:

    Not as much urban legend as a common joke. No one takes it seriously, but technically speaking, it would be valid behavior for conforming C compiler.

    See also the DeathStation 9000...



  • @Steve_The_Cynic said:

    Treating random patches of memory as if they were C++ objects by creating C-like char[] buffers and then ObjectType &refvar = (ObjectType &)c_like_buffer_variable

    This is the only way to handle variable-sized structures. I hate them. But API we're using at work needs them from time to time. Also, C and C++ standards explicitly state that casting to/from char pointers is defined and safe as long as after casting chars to an object, the underlying memory is in valid state for those objects. And with PODs, all states are valid. So no, that has nothing to do with UB.

    @Steve_The_Cynic said:

    Passing C++-like objects by value to varargs parameters. This one broke when the company upgraded the Solaris C++ compiler from rev 6 to rev 8, at a time when the up-to-date version was rev 11.

    I said designs, not writes. Because idiots are everywhere.



  • @Steve_The_Cynic said:

    Passing C++-like objects by value to varargs parameters. This one broke when the company upgraded the Solaris C++ compiler from rev 6 to rev 8, at a time when the up-to-date version was rev 11.

    When was this? Somebody needs a sound thumping on the head with varadic templates and/or currying operators (look at the operator% Boost.Format uses for an example of the latter).

    @Gaska said:

    This is the only way to handle variable-sized structures. I hate them. But API we're using at work needs them from time to time. Also, C and C++ standards explicitly state that casting to/from char pointers is defined and safe as long as after casting chars to an object, the underlying memory is in valid state for those objects. And with PODs, all states are valid. So no, that has nothing to do with UB.

    Yes -- you also need to do this when transporting C++ objects through a C layer to a callback, for instance. As long as the object type coming in and the object type going out match, you're golden -- if they don't though, of course, your bits will get shredded, albeit not in a UB way.



  • @Gaska said:

    I've never said anything about using ANSI C. Or any language, for that matter. I was just using the definition of undefined behavior that's contained in the ANSI C spec for convenience.

    Yes, that's the idea I was using, except applied to this. I just can't figure out your original response in that light:

    @Gaska said:

    I can't see any undefined behavior.



  • @tarunik said:

    As long as the object type coming in and the object type going out match, you're golden

    Not exactly - only char is guaranteed to work that way. There's whole paragraph dedicated only to specify char as an exception from the "casting is UB" rule.

    @boomzilla said:

    I just can't figure out your original response in that light:

    The code sample creates two-character string from 3rd and 4th letters of placeholder then parses it as 4-byte signed integer. Everything works as excepted every time no matter what - you get the id truncated to the first two digits.



  • @Gaska said:

    Everything works as excepted every time no matter what - you get the id truncated to the first two digits.

    But that's like looking at the source code of an ANSI C compiler and saying that because it calls summon_nasal_demons() there's no undefined behavior in source code fed to the compiler.



  • @accalia said:

    is that the one that literally says in the spec that a compiler upon encountering undefined behavior is allowed to implement the behavior in any manner it wishes to, up to and including by making demons fly out of the programmer's nostrils?

    The really fun part is the compiler is allowed to make the program time travel -- the compiler can make the program do anything even prior to the undefined behavior, including crossing sequence points and anything else. For example, null pointer dereferences are UB, so:

    int main() {
        printf("Hello world\n");
        fflush(stdout);
        return * (int*) NULL;
    }
    

    need not actually print anything. Every execution of that program would eventually hit the UB, so every execution's semantics are undefined and can be "modified."


  • Discourse touched me in a no-no place

    Oh the subject of UB and the DS9000, I was in two minds about posting this to the Bad Ideas thread, but the audience here might appreciate it more.

    Basically a post yesterday from Herb Sutter, posting some obviously (which he acknowledges) UB code, and expecting his audience to say what they would like to happen:

    std::vector<int> v = { 0, 0 };
    int i = 0;
    v[i++] = i++;
    std::cout << v[0] << v[1] << endl;
    

    Quite what he's expecting from such a post I'm not too sure...



  • I voted for the HARDDRIVE_NOT_FOUND option.



  • @Gaska said:

    This is the only way to handle variable-sized structures. I hate them. But API we're using at work needs them from time to time. Also, C and C++ standards explicitly state that casting to/from char pointers is defined and safe as long as after casting chars to an object, the underlying memory is in valid state for those objects. And with PODs, all states are valid. So no, that has nothing to do with UB.

    That would be fine, except that these were fixed-size "buffers". Sorry, "buffer" was a poor choice of terminology. What I meant was that they overlaid a templated C++ array-wrapper object over the top of a C-style declaration for a FORTRAN common-block CHARACTER array. To do this, they did a horrible hand-waving flou-flah with C++ reference variables initialised with nasty-looking casts to be references to the FORTRAN variables. The pointers were never pointers to objects.

    So, of course, if the C++ object is carefully crafted so as to (on your particular combination of compiler, version, architecture, compile switches, and so on) contain nothing but the POD data, you'll (usually) get away with it, but in the given situation, there was even no-virtual-members inheritance going on, so all bets are off.



  • @PJH said:

    his audience to say

    This guy is probably here already:

    I voted for other “11” because I think C++ should troll it’s users more.

    @PJH said:

    Quite what he's expecting from such a post I'm not too sure...

    He says (as part of a larger comment):

    ...This particular poll was prompted by some assertions I encountered of the form “of course people would expect the answer to be X” and so I thought I’d ask to get a rough data point on one narrow corner of the OOE issue...

    Which seems like a reasonable motivation.



  • Re: evaluation order and UB, in C89, during 1996:

    unsigned char *bp = /* some initialisation */;
    unsigned char bits[16] = { /* 16 suitable values all 0-15 */ };
    unsigned int n = /* some length */;
    
    unsigned int index;
    
    for( index = 0; index < n; index++ )
        *bp++ = bits[*bp >> 4] | (bits[*bp & 0x0f] << 4);
    

    I once came across this one (the 16 suitable values were such that the 4-bit value of bits[x] was the same bits in the opposite order - bits[5] was 10, bits[13] was 11, etc. so that the intent of the code was to do an 8-bit bit-order reversal) in code someone else wrote. On every platform the company had at the time, it did the same thing, except on the one I was working on, where it did something weirdly other.

    It is, of course, UB because of the read-use of bp on the right-hand side (*bp) combined with the write-use of bp on the left (bp++). Most compilers we had evaluated the whole right-hand side, then stored the value and incremented the pointer, as if the code were:

    rhs = bits[*bp >> 4] | (bits[*bp & 0x0f] << 4);
    *bp++ = rhs;
    

    My platform did this instead:

    plhs = bp++;
    *plhs = bits[*bp >> 4] | (bits[*bp & 0x0f] << 4);

    which displaced all the bytes down by one. For complicated and partially valid reasons the code was called twice on the same zone of memory, which caused a symptom that looked sort of like what I was expecting to see for a different root cause. The practical consequence was that I lost three days hunting where in the code the other root cause was actually happening before I managed to put a printf() in between the two calls, which showed me the bit-reversed displaced-by-one memory...



  • C doesn't seem to define LHS post incrementing very clearly...



  • @delfinom said:

    C doesn't seem to define LHS post incrementing very clearly...

    Sure it does, if the LHS sub-expression is *bp++ (store the value through the original value of bp and increment bp, in whatever order suits you) or *++bp (increment bp and store the value through the new value). Where it goes pear-shaped is if you do (*bp)++ = N;, because the ++ converts (*bp) from an l-value to an r-value, which is, by definition, not allowed as the target of an assignment.

    Remember that *bp++ is equivalent to *(bp++)... ;)



  • I mean, there's no definition of LHS or RHS subexpressions being evaluated first in the C standard (there is for a few operators but assignment is not one of them) which leads to the undefined behavior and quirkyness between platforms in this case.



  • @Gaska said:

    Not exactly - only char is guaranteed to work that way. There's whole paragraph dedicated only to specify char as an exception from the "casting is UB" rule.

    I mean -- when casting through char*, you need the incoming object type and the outgoing object type to match up in data layout -- otherwise you get the mangled bits I described.

    @Steve_The_Cynic said:

    That would be fine, except that these were fixed-size "buffers". Sorry, "buffer" was a poor choice of terminology. What I meant was that they overlaid a templated C++ array-wrapper object over the top of a C-style declaration for a FORTRAN common-block CHARACTER array. To do this, they did a horrible hand-waving flou-flah with C++ reference variables initialised with nasty-looking casts to be references to the FORTRAN variables. The pointers were never pointers to objects.

    So, of course, if the C++ object is carefully crafted so as to (on your particular combination of compiler, version, architecture, compile switches, and so on) contain nothing but the POD data, you'll (usually) get away with it, but in the given situation, there was even no-virtual-members inheritance going on, so all bets are off.


    Not quite all -- if you are on a specific platform, the platform C and C++ ABIs will make much more stringent guarantees about layout than the C and C++ standards can make. (The standards committee had to deal with wacko boxes like Cray vector machines.)

    Otherwise, how would COM ever work? It's not cross-ABI portable -- but it never was intended to be, nor is the ugly trick you're pulling here.

    @delfinom said:

    I mean, there's no definition of LHS or RHS subexpressions being evaluated first in the C standard (there is for a few operators but assignment is not one of them) which leads to the undefined behavior and quirkyness between platforms in this case.

    Assignment isn't a sequence point. Not fun, but true.



  • @boomzilla said:

    But that's like looking at the source code of an ANSI C compiler and saying that because it calls summon_nasal_demons() there's no undefined behavior in source code fed to the compiler.

    Lying is bad, but isn't a crime. That code is bad, but isn't an UB.

    Also, if I was to design C++, I would make increments type void.



  • @Gaska said:

    Lying is bad, but isn't a crime. That code is bad, but isn't an UB.

    Are you stupid or just trolling me?



  • @boomzilla said:

    Are you stupid or just trolling me?

    or?



  • @chubertdev said:

    or?

    Feel free to add your own options. I just can't tell any more.



  • @boomzilla said:

    Are you stupid or just trolling me?

    What exactly did I say that you consider stupid? That I refuse to acknowledge the terrible, terrible code that soon will turn into a very, very confusing bug as UB? Or maybe lying is crime in America (if so then sorry, I'm from Europe, and our customs are vastly different)? Or is there actual UB in the code that I missed?



  • @Gaska said:

    Or is there actual UB in the code that I missed?

    Fuuuuuck. I already pointed out that I assumed the datatype / size of the ids was undefined in the app's spec. But here's the undefined behavior:

    The datatype / size of ids for the app was not specified. Passing ids that are not represented by two digits is undefined behavior.

    There, now I've explained the joke at least 3 times.

    @Gaska said:

    Or maybe lying is crime in America (if so then sorry, I'm from Europe, and our customs are vastly different)?

    In some contexts, it definitely is a crime to lie. I suspect the same is true for most parts of Europe.



  • @boomzilla said:

    Feel free to add your own options. I just can't tell any more.



  • @boomzilla said:

    The datatype / size of ids for the app was not specified. Passing ids that are not represented by two digits is undefined behavior.

    It's perfectly defined - they get truncated to two most significant digits. Assuming the id starts where it should (at 3rd position in the string - which is probably defined somewhere).

    @boomzilla said:

    In some contexts, it definitely is a crime to lie. I suspect the same is true for most parts of Europe.

    The only such context I'm aware of is false testimony in court. But I never heard of any case where someone was prosecuted for false testimony. Joys of mid-eastern Europe.



  • @Gaska said:

    It's perfectly defined

    OK, I'm 89% sure you're trolling now. I'll definitely sleep better tonight knowing that.


  • SockDev

    @boomzilla said:

    @Gaska said:

    OK, I'm 89% sure you're trolling now. I'll definitely sleep better tonight knowing that.

    it's @gaska.... only 89%?

    really? :unamused:

    also :underage: because discourse tried using that for my smiley the first time.



  • I'm not trolling - I'm being literal. I think I know what you're trying to say - that such handling of IDs can put system in a totally unpredictable state like ID-duplication-related race conditions or other such niceties. I agree, it can happen, it most likely will happen, and sure as hell won't be nice to debug. But as per that definition of UB, it's not it - because the code is perfectly unambiguous for the compiler and will do exactly the thing it says.

    BTW, if the format of string that's being parsed isn't defined anywhere, they have much bigger problems.



  • @accalia said:

    it's @gaska.... only 89%?

    Since when I'm synonymous to trolling? No, actually I have bigger question: since when I'm a recognizable person on this forum <U+whatever that puts logical negation operator over conditional operator>


  • SockDev

    @Gaska said:

    Since when I'm synonymous to trolling?

    since now? i was attempting to use a very old meme that goes: It's X, You're really not sure? it's ancient, and not funny, sorry.

    @Gaska said:

    No, actually I have bigger question: since when I'm a recognizable person on this forum

    this one i've got a better answer for. since a couple of weeks ago when i first noticed you being active with other regulars.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.