A problem with big numbers
-
In one of the projects I'm currently involved with, there is a simple templating engine. It involves fetching data from the db, resources, etc, and patching that together to form data easy to consume for some other module.
Placeholder for data in a template looks something like this:
"$db_id_0,db_id_1,resource_id_0,resource_id_1#"
I was asked to look at engine code and make sure it works correctly. While browsing the source I stumble upon...
foreach(var placeholder in placeholders) { var firstId = placeholder.Substring(2, 2); int id = Int32.Parse(firstId); //fetch data from db, patch into resulting data //[...] similar stuff for remainder of the placeholder }
With worst expectations I ask author of above snippet
- why does it assume that id has two digits?
- when I tested this algo every id I saw had two digits.
He's a senior developer. I have no comment.
-
When you run out of ids, you just have to reuse the earlier ones, I don't see the problem.
-
-
When you run out of ids, you just have to reuse the earlier ones, I don't see the problem.
And it happens automatically!
A senior developer having a senior moment…
Browsing other parts that he programmed I'm coming to conclusion that those moments start when he touches the keyboard and last until he leaves his desk.
-
And there's no way that we'll ever need more than
100~10 at a time!
-
-
Clearly, he's learned the value of "Undefined behavior."
I can't see any undefined behavior. If I understand the code correctly, it strips all but two first digits from ID.
-
I can't see any undefined behavior. If I understand the code correctly, it strips all but two first digits from ID.
I'm assuming, of course, that there's no mention of datatype / size in the spec for the ids.
-
I'm assuming, of course, that there's no mention of datatype / size in the spec for the ids.
Spec? Never heard of it.
-
I'm assuming, of course, that there's no mention of datatype / size in the spec for the ids.
I was assuming undefined behavior as defined in ANSI C specification.
-
I was assuming undefined behavior as defined in ANSI C specification.
Is it time to stock up on nose demon spray again?
-
Is it time to stock up on nose demon spray again?
No, because as I said, there's no UB of this kind in that code.
-
I was assuming undefined behavior as defined in ANSI C specification.
I can't imagine why. There's no way that shit compiles.
-
I can't imagine why. There's no way that shit compiles.
Care to elaborate? Looks perfectly fine to my untrained eye.I used ANSI C definition because it's general enough to cover all technologies that exist - even though no one designs anything with explicit undefined behaviors anymore.
-
I was assuming undefined behavior as defined in ANSI C specification.
is that the one that literally says in the spec that a compiler upon encountering undefined behavior is allowed to implement the behavior in any manner it wishes to, up to and including by making demons fly out of the programmer's nostrils?
or is that just an urban legend?
-
Care to elaborate? Looks perfectly fine to my untrained eye.
foreach(var placeholder in placeholders)
Looks like no ANSI C I've ever seen. Unless it's from some preprocessor maniac who wants to pretend he's writing C#.
-
is that the one that literally says in the spec that a compiler upon encountering undefined behavior is allowed to implement the behavior in any manner it wishes to, up to and including by making demons fly out of the programmer's nostrils?
Up to the comma, it's literally that.Looks like no ANSI C I've ever seen.
I've never said anything about using ANSI C. Or any language, for that matter. I was just using the definition of undefined behavior that's contained in the ANSI C spec for convenience.
-
Up to the comma, it's literally that.
so the nasal demons are just an urban legend.
that's sad.... but not unexpected.
-
so the nasal demons are just an urban legend.
Not as much urban legend as a common joke. No one takes it seriously, but technically speaking, it would be valid behavior for conforming C compiler.
-
http://www.catb.org/jargon/html/N/nasal-demons.html claims to have the original source of the meme
-
I used ANSI C definition because it's general enough to cover all technologies that exist - even though no one designs anything with explicit undefined behaviors anymore.
(Emphasis added.)Perhaps not, but I've seen plenty of code that was clearly designed by people who either really didn't know about undefined behaviour or had no fear of it. You know, like:
==> Treating random patches of memory as if they were C++ objects by creating C-like char[] buffers and then ObjectType &refvar = (ObjectType &)c_like_buffer_variable;.(1) With careful set-up of the class ObjectType, you will usually get away with this, except that the fine colleagues(2) who did this also liked:
==> Passing C++-like objects by value to varargs parameters. This one broke when the company upgraded the Solaris C++ compiler from rev 6 to rev 8, at a time when the up-to-date version was rev 11.(1) Actually, the buffers were not C-like. They were FORTRAN-like. In 2004, the code-base included (or so I was told) around 60 million lines of FORTRAN, that is, twice as many lines of code as there were in Windows 2000.
(2) This is a euphemism. The said colleagues were prime fuckwits, evidently.
-
Not as much urban legend as a common joke. No one takes it seriously, but technically speaking, it would be valid behavior for conforming C compiler.
See also the DeathStation 9000...
-
Treating random patches of memory as if they were C++ objects by creating C-like char[] buffers and then ObjectType &refvar = (ObjectType &)c_like_buffer_variable
This is the only way to handle variable-sized structures. I hate them. But API we're using at work needs them from time to time. Also, C and C++ standards explicitly state that casting to/from char pointers is defined and safe as long as after casting chars to an object, the underlying memory is in valid state for those objects. And with PODs, all states are valid. So no, that has nothing to do with UB.Passing C++-like objects by value to varargs parameters. This one broke when the company upgraded the Solaris C++ compiler from rev 6 to rev 8, at a time when the up-to-date version was rev 11.
I said designs, not writes. Because idiots are everywhere.
-
Passing C++-like objects by value to varargs parameters. This one broke when the company upgraded the Solaris C++ compiler from rev 6 to rev 8, at a time when the up-to-date version was rev 11.
When was this? Somebody needs a sound thumping on the head with varadic templates and/or currying operators (look at the
operator%
Boost.Format uses for an example of the latter).This is the only way to handle variable-sized structures. I hate them. But API we're using at work needs them from time to time. Also, C and C++ standards explicitly state that casting to/from char pointers is defined and safe as long as after casting chars to an object, the underlying memory is in valid state for those objects. And with PODs, all states are valid. So no, that has nothing to do with UB.
Yes -- you also need to do this when transporting C++ objects through a C layer to a callback, for instance. As long as the object type coming in and the object type going out match, you're golden -- if they don't though, of course, your bits will get shredded, albeit not in a UB way.
-
I've never said anything about using ANSI C. Or any language, for that matter. I was just using the definition of undefined behavior that's contained in the ANSI C spec for convenience.
Yes, that's the idea I was using, except applied to this. I just can't figure out your original response in that light:
I can't see any undefined behavior.
-
As long as the object type coming in and the object type going out match, you're golden
Not exactly - only char is guaranteed to work that way. There's whole paragraph dedicated only to specify char as an exception from the "casting is UB" rule.I just can't figure out your original response in that light:
The code sample creates two-character string from 3rd and 4th letters ofplaceholder
then parses it as 4-byte signed integer. Everything works as excepted every time no matter what - you get the id truncated to the first two digits.
-
Everything works as excepted every time no matter what - you get the id truncated to the first two digits.
But that's like looking at the source code of an ANSI C compiler and saying that because it calls
summon_nasal_demons()
there's no undefined behavior in source code fed to the compiler.
-
is that the one that literally says in the spec that a compiler upon encountering undefined behavior is allowed to implement the behavior in any manner it wishes to, up to and including by making demons fly out of the programmer's nostrils?
The really fun part is the compiler is allowed to make the program time travel -- the compiler can make the program do anything even prior to the undefined behavior, including crossing sequence points and anything else. For example, null pointer dereferences are UB, so:int main() { printf("Hello world\n"); fflush(stdout); return * (int*) NULL; }
need not actually print anything. Every execution of that program would eventually hit the UB, so every execution's semantics are undefined and can be "modified."
-
Oh the subject of UB and the DS9000, I was in two minds about posting this to the Bad Ideas thread, but the audience here might appreciate it more.
Basically a post yesterday from Herb Sutter, posting some obviously (which he acknowledges) UB code, and expecting his audience to say what they would like to happen:
std::vector<int> v = { 0, 0 }; int i = 0; v[i++] = i++; std::cout << v[0] << v[1] << endl;
Quite what he's expecting from such a post I'm not too sure...
-
I voted for the HARDDRIVE_NOT_FOUND option.
-
This is the only way to handle variable-sized structures. I hate them. But API we're using at work needs them from time to time. Also, C and C++ standards explicitly state that casting to/from char pointers is defined and safe as long as after casting chars to an object, the underlying memory is in valid state for those objects. And with PODs, all states are valid. So no, that has nothing to do with UB.
That would be fine, except that these were fixed-size "buffers". Sorry, "buffer" was a poor choice of terminology. What I meant was that they overlaid a templated C++ array-wrapper object over the top of a C-style declaration for a FORTRAN common-block CHARACTER array. To do this, they did a horrible hand-waving flou-flah with C++ reference variables initialised with nasty-looking casts to be references to the FORTRAN variables. The pointers were never pointers to objects.So, of course, if the C++ object is carefully crafted so as to (on your particular combination of compiler, version, architecture, compile switches, and so on) contain nothing but the POD data, you'll (usually) get away with it, but in the given situation, there was even no-virtual-members inheritance going on, so all bets are off.
-
his audience to say
This guy is probably here already:
I voted for other “11” because I think C++ should troll it’s users more.
Quite what he's expecting from such a post I'm not too sure...
He says (as part of a larger comment):
...This particular poll was prompted by some assertions I encountered of the form “of course people would expect the answer to be X” and so I thought I’d ask to get a rough data point on one narrow corner of the OOE issue...
Which seems like a reasonable motivation.
-
Re: evaluation order and UB, in C89, during 1996:
unsigned char *bp = /* some initialisation */; unsigned char bits[16] = { /* 16 suitable values all 0-15 */ }; unsigned int n = /* some length */; unsigned int index; for( index = 0; index < n; index++ ) *bp++ = bits[*bp >> 4] | (bits[*bp & 0x0f] << 4);
I once came across this one (the 16 suitable values were such that the 4-bit value of bits[x] was the same bits in the opposite order - bits[5] was 10, bits[13] was 11, etc. so that the intent of the code was to do an 8-bit bit-order reversal) in code someone else wrote. On every platform the company had at the time, it did the same thing, except on the one I was working on, where it did something weirdly other.
It is, of course, UB because of the read-use of bp on the right-hand side (*bp) combined with the write-use of bp on the left (bp++). Most compilers we had evaluated the whole right-hand side, then stored the value and incremented the pointer, as if the code were:
rhs = bits[*bp >> 4] | (bits[*bp & 0x0f] << 4); *bp++ = rhs;
My platform did this instead:
plhs = bp++; *plhs = bits[*bp >> 4] | (bits[*bp & 0x0f] << 4);
which displaced all the bytes down by one. For complicated and partially valid reasons the code was called twice on the same zone of memory, which caused a symptom that looked sort of like what I was expecting to see for a different root cause. The practical consequence was that I lost three days hunting where in the code the other root cause was actually happening before I managed to put a printf() in between the two calls, which showed me the bit-reversed displaced-by-one memory...
-
C doesn't seem to define LHS post incrementing very clearly...
-
C doesn't seem to define LHS post incrementing very clearly...
Sure it does, if the LHS sub-expression is *bp++ (store the value through the original value of bp and increment bp, in whatever order suits you) or *++bp (increment bp and store the value through the new value). Where it goes pear-shaped is if you do (*bp)++ = N;, because the ++ converts (*bp) from an l-value to an r-value, which is, by definition, not allowed as the target of an assignment.
Remember that *bp++ is equivalent to *(bp++)... ;)
-
I mean, there's no definition of LHS or RHS subexpressions being evaluated first in the C standard (there is for a few operators but assignment is not one of them) which leads to the undefined behavior and quirkyness between platforms in this case.
-
Not exactly - only char is guaranteed to work that way. There's whole paragraph dedicated only to specify char as an exception from the "casting is UB" rule.
I mean -- when casting through
char*
, you need the incoming object type and the outgoing object type to match up in data layout -- otherwise you get the mangled bits I described.That would be fine, except that these were fixed-size "buffers". Sorry, "buffer" was a poor choice of terminology. What I meant was that they overlaid a templated C++ array-wrapper object over the top of a C-style declaration for a FORTRAN common-block CHARACTER array. To do this, they did a horrible hand-waving flou-flah with C++ reference variables initialised with nasty-looking casts to be references to the FORTRAN variables. The pointers were never pointers to objects.
So, of course, if the C++ object is carefully crafted so as to (on your particular combination of compiler, version, architecture, compile switches, and so on) contain nothing but the POD data, you'll (usually) get away with it, but in the given situation, there was even no-virtual-members inheritance going on, so all bets are off.
Not quite all -- if you are on a specific platform, the platform C and C++ ABIs will make much more stringent guarantees about layout than the C and C++ standards can make. (The standards committee had to deal with wacko boxes like Cray vector machines.)Otherwise, how would COM ever work? It's not cross-ABI portable -- but it never was intended to be, nor is the ugly trick you're pulling here.
I mean, there's no definition of LHS or RHS subexpressions being evaluated first in the C standard (there is for a few operators but assignment is not one of them) which leads to the undefined behavior and quirkyness between platforms in this case.
Assignment isn't a sequence point. Not fun, but true.
-
But that's like looking at the source code of an ANSI C compiler and saying that because it calls summon_nasal_demons() there's no undefined behavior in source code fed to the compiler.
Lying is bad, but isn't a crime. That code is bad, but isn't an UB.Also, if I was to design C++, I would make increments type void.
-
Lying is bad, but isn't a crime. That code is bad, but isn't an UB.
Are you stupid or just trolling me?
-
-
-
Are you stupid or just trolling me?
What exactly did I say that you consider stupid? That I refuse to acknowledge the terrible, terrible code that soon will turn into a very, very confusing bug as UB? Or maybe lying is crime in America (if so then sorry, I'm from Europe, and our customs are vastly different)? Or is there actual UB in the code that I missed?
-
Or is there actual UB in the code that I missed?
Fuuuuuck. I already pointed out that I assumed the datatype / size of the ids was undefined in the app's spec. But here's the undefined behavior:
The datatype / size of ids for the app was not specified. Passing ids that are not represented by two digits is undefined behavior.
There, now I've explained the joke at least 3 times.
Or maybe lying is crime in America (if so then sorry, I'm from Europe, and our customs are vastly different)?
In some contexts, it definitely is a crime to lie. I suspect the same is true for most parts of Europe.
-
-
The datatype / size of ids for the app was not specified. Passing ids that are not represented by two digits is undefined behavior.
It's perfectly defined - they get truncated to two most significant digits. Assuming the id starts where it should (at 3rd position in the string - which is probably defined somewhere).In some contexts, it definitely is a crime to lie. I suspect the same is true for most parts of Europe.
The only such context I'm aware of is false testimony in court. But I never heard of any case where someone was prosecuted for false testimony. Joys of mid-eastern Europe.
-
It's perfectly defined
OK, I'm 89% sure you're trolling now. I'll definitely sleep better tonight knowing that.
-
@Gaska said:
OK, I'm 89% sure you're trolling now. I'll definitely sleep better tonight knowing that.it's @gaska.... only 89%?
really?
also because discourse tried using that for my smiley the first time.
-
I'm not trolling - I'm being literal. I think I know what you're trying to say - that such handling of IDs can put system in a totally unpredictable state like ID-duplication-related race conditions or other such niceties. I agree, it can happen, it most likely will happen, and sure as hell won't be nice to debug. But as per that definition of UB, it's not it - because the code is perfectly unambiguous for the compiler and will do exactly the thing it says.
BTW, if the format of string that's being parsed isn't defined anywhere, they have much bigger problems.
-
it's @gaska.... only 89%?
Since when I'm synonymous to trolling? No, actually I have bigger question: since when I'm a recognizable person on this forum <U+whatever that puts logical negation operator over conditional operator>
-
Since when I'm synonymous to trolling?
since now? i was attempting to use a very old meme that goes:
It's X, You're really not sure?
it's ancient, and not funny, sorry.No, actually I have bigger question: since when I'm a recognizable person on this forum
this one i've got a better answer for. since a couple of weeks ago when i first noticed you being active with other regulars.