Zed Shaw gets schooled on C undefined behavior


  • Discourse touched me in a no-no place

    @Kian said:

    Calling wcslen

    You mentioned it first. And in this case, my string is 5 characters, not 3, including the embedded null.

    @Kian said:

    But imagine your buffer was {'a','b','c','d'}, and SysAlloc didn't add a null terminator. Then SysStringLength would give you 4, and wcslen would give you a seg fault.

    If you only used BSTRs with functions that expect BSTRs, you wouldn't need the terminator. That's a convenience, not the canonical marker-of-length, is all I was ever saying.


  • Discourse touched me in a no-no place

    @RaceProUK said:

    humb != finger

    Also, what if I held up the index and ring fingers? Under @kian's analogy I'm holding up one, not two.



  • I have brand new microcontrollers designed in 2014 with just 1KB of ram :D

    Hint: It costs 20 cents for a reason. If you want your cheap toys, you get cheap silicon. You want 128KB of RAM? Then you pay $10 per chip. 32KB chips are in the $2-3 price range.

    Mind you this seems weird because you can get 4GB DDR for like $20 right? Well micros do not use DDR or high frequency buses until you hit the high end $12+ chips.



  • @FrostCat said:

    If you only used BSTRs with functions that expect BSTRs, you wouldn't need the terminator. That's a convenience, not the canonical marker-of-length, is all I was ever saying.

    Everything is a convenience. The null terminator is there because it serves a function, just as the length field is there because it serves a function. COM itself is meant to make calling other languages and applications convenient. Saying "It's a convenience" is devoid of meaning. You don't like calling it a null-terminated string? Fine, it's not a null-terminated string. It's just a structure containing a length, a string, and a null terminator field.

    @FrostCat said:

    Also, what if I held up the index and ring fingers? Under @kian's analogy I'm holding up one, not two.
    You're not very good at analogies.... :P



  • @Kian said:

    You're not very good at analogies....

    It made sense to me.

    If you provide this set of chars 0x20, 0x00, 0x20, you'll get 1 for the length, even though there are two valid characters.

    But, null-termination is how most "strings" are defined, so 1 is the correct answer for string length.

    It's not however the correct value for how many valid characters exist within the space of the buffer.


  • FoxDev

    @xaade said:

    But, null-termination is how most "strings" are defined

    No, that's how C-strings are defined; a lot of other languages define them differently



  • yeah, ok.

    I thought C was the context.


  • Discourse touched me in a no-no place

    @Kian said:

    The null terminator is there because it serves a function, just as the length field is there because it serves a function

    We need a "you're talking past me" emoticon, apparently.

    Pay attention, I'll say it one more time: the null terminator is a convenience function in a BSTR. A BSTR, as long as you work with BSTR-aware code, doesn't need it. The way you find out how long a BSTR is is to look at the count.

    The null terminator is not, properly speaking, any such thing, in BSTR semantics, because there's no way to distinguish an embedded two-byte null from the "terminator".

    If you use non-BSTR-aware functions, they'll break with embedded nulls.

    @Kian said:

    and a null terminator field.

    Ugh. No, it's something that can, under some circumstances, be used as one, but formally it isn't.



  • @xaade said:

    It made sense to me.

    The analogy was "question to function", not "fingers to characters". If you ask a question, the correct answer is the one that answers the question. If you call a function, the correct response is whatever the function is specified to do with the input. If you want to know how many fingers are in your hand, ask how many fingers are in your hand, not how many you are holding up. If you want to check the length of a string, ask for the length of the string, not the location of the first null.

    @FrostCat said:

    Ugh. No, it's something that can, under some circumstances, be used as one, but formally it isn't.
    Really? You should tell the people that wrote the reference, which I linked when I mentioned BSTR in the first place, because they say:



  • @FrostCat said:

    If you use non-BSTR-aware functions, they'll break with embedded nulls.

    They won't break, they'll do what they're supposed to do. It may not be what you want them to do, but if you don't want them to do what they're designed to do, DON'T CALL THEM.


  • Discourse touched me in a no-no place

    @Kian said:

    You're not very good at analogies

    Sure I am. Some things are hard to shoehorn, and I was just extending the one you used instead of making another one up.


  • Discourse touched me in a no-no place

    @xaade said:

    If you provide this set of chars 0x20, 0x00, 0x20, you'll get 1 for the length, even though there are two valid characters.

    Under some circumstances, there are three. As I said upthread, it was common in older computers to use a string variable to hold assembly code and execute it dynamically. IIRC that's more or less specifically one reason BSTRS allow internal nulls.

    @xaade said:

    But, null-termination is how most "strings" are defined, so 1 is the correct answer for string length.

    Yeah, most strings. Specifically not a BSTR, as my program demonstrates!


  • Discourse touched me in a no-no place

    @Kian said:

    If you call a function, the correct response is whatever the function is specified to do with the input.

    Well, I'd argue using wcslen on a BSTR is generally the wrong thing to do. :) Unless you know the BSTR in question will never have internal nulls. Because it will give the correct answer for its contract, but it's not actually the length of the BSTR.


  • Discourse touched me in a no-no place

    @Kian said:

    Really? You should tell the people that wrote the reference, which I linked when I mentioned BSTR in the first place, because they say:

    The "terminator" CANNOT canonically be considered "the thing that tells you where the end of the string is", even though they use the word.

    What does SysStringLen do? "The returned value may be different from strlen(bstr) if the BSTR contains embedded Null characters. This function always returns the number of characters specified in the cch parameter of the SysAllocStringLen function used to allocate the BSTR."

    Hey, I guess that means that if you want to know how long a BSTR is, you use the character count, not the "terminator".


  • Discourse touched me in a no-no place

    @Kian said:

    They won't break, they'll do what they're supposed to do.

    Nitpickery accepted, because it argues my point that the null "terminator" is not the canonical way to find the end of a BSTR.



  • It's a huge WTF that BSTR, a string type with additional information, is ultimately typedefed from wchar_t in the first place.

    (For reference:

    #if !defined(_NATIVE_WCHAR_T_DEFINED)
    typedef unsigned short WCHAR;
    #else
    typedef wchar_t WCHAR;
    #endif`
    
    typedef WCHAR OLECHAR;
    typedef OLECHAR* BSTR;

  • FoxDev

    Only if you ignore the fact that the entire COM API is designed to be used from C



  • That doesn't explain why it isn't a struct of some sort.


  • FoxDev

    Because the string length field is not to be exposed to the developer; it's an internal-use-only field



  • @powerlord said:

    That doesn't explain why it isn't a struct of some sort.

    Can you make variable width stucts? In C and C++, you can't. The way to hack it is to make a struct whose last member is a zero-width array. I imagine the guy who had to design BSTR didn't like it.


  • Discourse touched me in a no-no place

    @powerlord said:

    It's a huge WTF that BSTR, a string type with additional information, is ultimately typedefed from wchar_t in the first place.

    wchar_t was almost certainly not widespread, if extant at all, when BSTR was invented, so the use of wchar_t must've been retrofitted in.


  • Discourse touched me in a no-no place

    @RaceProUK said:

    Because the string length field is not to be exposed to the developer; it's an internal-use-only field

    Which point is highlighted by the fact that a BSTR is a pointer not to the beginning of the struct, but to the wchar_t pointer inside it.



  • @FrostCat said:

    The "terminator" CANNOT canonically be considered "the thing that tells you where the end of the string is", even though they use the word.

    The terminator is not what determines where the ends is, it's the thing that is at the end. You don't like calling it the terminator? You can propose another term for "last thing that always has to look like this".

    The terminator in astronomy, for example, is the line where day turns to night. It's where the shadowed part of the planet starts. However, not every shadow in the daylight part marks the terminator.



  • @cartman82 said:

    @blakeyrat said:
    Writing your OWN code to copy strings is a stupid concept, you're right that it's not an ASCII thing necessarily, but that doesn't change that it's still fucking idiotic.

    Agreed. If you decide to write your own string copy code, you're doing it wrong.

    But it's still a useful exercise in a programming book.

    I'm writing an OS. I needed to do it, as those functions aren't even available (or any function for that matter).
    Can't think of any other reason though...

    Edit: Yes, I AM inspired by TempleOS's pretty marquees and stuff.


  • Discourse touched me in a no-no place

    @Kian said:

    The terminator is not what determines where the ends is

    That's what I've been saying! Thanks for coming around to my POV.

    My point is that in a real null-terminated string, like the C ones, the null character IS the terminator. It's the canonical way you find the end of the string. And one more time, that's not true for a BSTR, because the count is canonical.



  • We seem to have had a failure of communication then, because I never claimed otherwise. Which is why I qualified on one of my first responses that BSTRs may not be not null-terminated, but they have a null terminator. The fact that you don't use the null terminator to determine the length has no bearing on the fact that they have one regardless.



  • I rather like CosmOS, myself.



  • @boomzilla said:

    He didn't say "string functions." He referred specifically to the function that @FrostCat mentioned.
    What are you going on about? The first function that FrostCat mentioned specifically was wcslen, which doesn't work properly (for any reasonable definition of properly) if you have a string with embedded NULs. He has also referred to functions that operate on NUL-terminated strings in general multiple times

    @Kian said:

    So when you call a function that specifically operates until the first null, the first null is what determines how far into the string the function will operate. It's a tautology, it's not that hard to grasp. It doesn't mean the function is doing the wrong thing. It's simply wrong to call it if you didn't want that behavior.
    By that logic, size_t strlen(const char * s) { return 5; } isn't doing the wrong thing. If you call it with a string that's not 5 bytes, that was just your mistake for using it if you didn't want that behavior.

    Let me try to articulate precisely what I'm trying to say:

    • C is the outlier in terms of counted vs. terminated strings being the modus operandi
    • Functions, like strlen, that are designed for terminated strings do not produce reasonable answers for counted strings

    @Kian said:

    If your string is both counted AND has a null terminator, however, you can trust that if you pass it to a function that requires a null terminator, it will do the right thing
    No! I will not agree that stupid semantics are right! I will agree that it won't directly provoke undefined behavior and crash your program. That is worth putting a terminator on the end of your counted strings for, but it's not an excuse for using functions designed for terminated strings in the first place.

    @Kian said:

    Everything is a convenience. The null terminator is there because it serves a function, just as the length field is there because it serves a function. COM itself is meant to make calling other languages and applications convenient. Saying "It's a convenience" is devoid of meaning.
    If BSTRs did not have terminators, there would be no loss of information. If you wanted to (incorrectly) call a function that expects a terminated string, you could allocate some new space, copy the BSTR to it, and tack a terminator on the end. That is why it's a convenience, so you don't have to do that.

    The terminator on a C string is not a convenience because otherwise you don't know where the string ends. The count field of a counted string is not a convenience because otherwise you don't know where the string ends.

    @Kian said:

    Really? You should tell the people that wrote the reference, which I linked when I mentioned BSTR in the first place, because they say:
    It's defined that way because it's convenient to be able to use a BSTR where a terminated string is expected, not because it's necessary!

    @FrostCat said:

    Well, I'd argue using wcslen on a BSTR is generally the wrong thing to do.
    I'd say it is effectively always the wrong thing to do, even if you know there are no embedded NULs. In fact, the only exception I can think of is if you think it might have embedded NULs, and you want to know where the first one is (e.g. because you want to copy it to some other thing that is a terminated string and you are OK with the potential loss of information).



  • @EvanED said:

    The first function that FrostCat mentioned specifically was wcslen, which doesn't work properly (for any reasonable definition of properly) if you have a string with embedded NULs.

    Ok, suppose you have a string with embedded nulls, and you want to know the position of the various embedded null. Perhaps you want to split the string into various sub-strings, for example. Unicode doesn't allow embedded nulls in text, so you could pack various text strings into one BSTR, and delimit them with nulls. What do you do? Do you write your own function, or call the function specifically designed to find the first null in a char array?

    #include <vector>
    
    std::vector<wchar_t*> GetSubstrings(BSTR input)
    {
      std::vector<wchar_t*> result;
      int inputLength = SysStrLength(input);
      for(int pos = 0; pos <= inputLength;)
      {
        result.push_back(input+pos);
        pos = wcslen(input+pos)+1;
      }
      return result;
    }
    

    There you go, a reasonable use case for functions that handle c-strings being fed a BSTR. Not to mention all the code that simply isn't aware of BSTR and treats any pointers to chars as c-strings.

    @EvanED said:

    By that logic, size_t strlen(const char * s) { return 5; } isn't doing the wrong thing. If you call it with a string that's not 5 bytes, that was just your mistake for using it if you didn't want that behavior.
    What does this function do?

    int KillAllHumans(int a, int b) { return a+b;} 
    

    Does it kill all humans, or does it return the result of adding it's two inputs? If the documentation says "The function KillAllHumans returns the result of adding the two parameters", is it wrong that it doesn't kill all humans? Legacy functions have unfortunate names. They were written when the conventions were different. So don't call them if you don't want what they do. If your function is documented as always returning 5, then it would be pretty stupid of me to call it for anything other than asking for a 5. It's not the function's fault if I'm an idiot that doesn't read documentation, regardless of the name.

    @EvanED said:

    No! I will not agree that stupid semantics are right!
    I'm not asking you to agree that the semantics are right! Who the fuck cares if they're right or wrong? They are what they are. They're already in your system whether you want to use them or not. A lot of legacy code uses them, so when designing your system you have to understand that these things exist. It doesn't mean you have to use them yourself if you don't want to.

    But if you use them without understanding what they do, or expecting them to do what you want them to do because that is what you think they should do, you're a terrible programmer. Functions do what they do. If they meet the spec, they're not buggy. I don't care about morality, I care about the spec. You don't like the spec? I don't care. I code to spec, not to opinions.

    @EvanED said:

    If BSTRs did not have terminators, there would be no loss of information. If you wanted to (incorrectly) call a function that expects a terminated string, you could allocate some new space, copy the BSTR to it, and tack a terminator on the end. That is why it's a convenience, so you don't have to do that.
    Ok, that has some substance. And yes, I know it's more handy, when you have memory to spare, to embed the length of the string with the string. I'm not arguing null terminators are awesome, I'm saying they serve a function, and legacy code expects them. Having to make a copy whenever you want to call one of those functions would be error prone and a pain in the ass. That's reason enough to have it.

    @EvanED said:

    It's defined that way because it's convenient
    I don't care why it's defined that way. It's enough that it is, and that I have to work with it. My job is to understand how it works, and use it correctly. Not to critique it.

    @EvanED said:

    I'd say it is effectively always the wrong thing to do, even if you know there are no embedded NULs.
    I gave a reasonable reason you might want to above.


  • Banned

    Zed Shaw used to work for Bear Sterns, which I thought was hilarious. Up until they stopped existing.


  • 🚽 Regular


  • ♿ (Parody)

    @EvanED said:

    What are you going on about?

    What I remember from the conversation.

    @EvanED said:

    The first function that FrostCat mentioned specifically was wcslen, which doesn't work properly (for any reasonable definition of properly) if you have a string with embedded NULs.

    Yes, exactly. In the context of that function, there's no embedded null. You just have a shorter string than you thought you had.



  • @cartman82 said:

    incidentally, there's not UB listed in ANSI for 'alter the variable of a for-loop'

    @cartman82 said:

    incidentally, there's not UB listed in ANSI

    @cartman82 said:

    not UB listed in ANSI

    @cartman82 said:

    not UB listed

    @cartman82 said:

    UB listed

    My sides


  • Discourse touched me in a no-no place

    @boomzilla said:

    Yes, exactly. In the context of that function, there's no embedded null. You just have a shorter string than you thought you had.

    Just for the record I mentioned wcslen because someone else, maybe @Kian, mentioned "wstrlen" above that, and the latter doesn't exist in MSVC (if at all? When I googled it, I wound up on the MSDN page for wcslen and family).

    So I wasn't even the one to introduce "using functions that aren't appropriate for BSTRs", although I don't recall if we were talking about BSTRs per se yet.


  • ♿ (Parody)

    All I know is once that was the context, even if you pass it a BSTR, it's not a BSTR any longer. Those sorts of context switches seem like exercises in pedantic dickweedery, but so is feeding source code to a compiler.



  • @Kian said:

    It's not the function's fault if I'm an idiot that doesn't read documentation, regardless of the name.

    Typically, if I find software that's that poorly named, it's indicative of the quality and I use something else.

    I'm sure having completely unintuitive interfaces are great for maintenance.


  • Discourse touched me in a no-no place

    @boomzilla said:

    All I know is once that was the context, even if you pass it a BSTR, it's not a BSTR any longer.

    That's some kind of lemma to my point.



  • @xaade said:

    I'm sure having completely unintuitive interfaces are great for maintenance.

    The real functions in question were intuitive back when they were designed. Then "it" changed and now they're not intuitive anymore. Intuitive or not, however, there's no excuse for not knowing what functions you call do.

    The hypothetical examples simply highlight the fact that you can't let auto-complete code for you. You have to read the documentation for every function you call, not just guess. And while it would be nice to be able to change libraries when you don't like one, if you inherit something it's rare that you get the choice at all.



  • @Kian said:

    there's no excuse for not knowing what functions you call do.

    Yeah, it's called bad documentation.

    @Kian said:

    The real functions in question were intuitive back when they were designed.

    Oh really?

    strstr vs. strpbrk?



  • @xaade said:

    Yeah, it's called bad documentation.

    I wouldn't say that's an excuse. It may be a reason, but I can't say "my code is good, it's the framework's fault if it doesn't do what I want it to". It just means your work is going to be that much harder.

    To define my terms, I understand an excuse to be something that releases you of responsibility. A reason is why you do something. Looked it up, and google at least agrees:

    ex·cuse
    verb
    ikˈskyo͞oz/
    1.
    attempt to lessen the blame attaching to (a fault or offense); seek to defend or justify.
    "he did nothing to hide or excuse Jacob's cruelty"
    synonyms: justify, defend, condone, vindicate; More
    2.
    release (someone) from a duty or requirement.



  • :moving_goal_post:

    @Kian said:

    there's no excuse for not knowing what functions you call do.

    Disagree

    This isn't an excuse for having broken code.

    Agreed.

    Often I use another interface. If I can't use another interface, I consider using an adapter. So that my code makes sense and is maintainable.



  • @xaade said:

    :moving_goal_post:

    How is it moving the goal post? I made a statement and I stand by it. I clarified my definition in case my point wasn't clear enough, but considering my definition is the dictionary definition, you can't even claim I redefined the words to suit me. If you don't know what words mean, that's not my fault.

    To reiterate, if you choose to type the name of a function into your editor, compile that code, and run it, without any clue as to what that code is supposed to do, that's wrong. It may be necessary, but you are still responsible for whatever happens next. You are not excused just because you had no way of knowing.



  • @Kian said:

    You are not excused just because you had no way of knowing.

    Ok, look, if the doc says returns 5, and it returns 6. I'm excused from the code failing. Am I excused if I leave it that way, no.



  • Of course I am! I wouldn't have to correct them if I didn't!
    @xaade said:

    Ok, look, if the doc says returns 5, and it returns 6. I'm excused from the code failing. Am I excused if I leave it that way, no.

    Sure, we agree on that.


  • area_pol

    @Kian said:


    Discussion between 5 people or fewer is a crime!


  • Banned

    Depends how you do it.



  • Doesn't matter, Discourse tries to moralise anyway, because it doesn't know if it is legitimately right or not. Just one if the many things we have laughed at as "basically broken by design"

    Even the crusty 1990s toxic hell stew forums knew this one, because they tried it and removed it again... But that learning experience doesn't count because 1990s toxic hell stew, right?



  • @cartman82 said:

    Interesting guy. Obviously talented in many ways. But at the same time, seems to be his own greatest enemy.

    He's a ranter.

    He's not my favorite ranter.



  • @Kian said:

    FUCK OFF ALL DISCOURSE TOASTERS
    JUST FUCK OFF
    YOU ARE ALL BROKEN
    EVERY ONE OF YOU
    NONE OF YOU IS WORTH A FART IN A HIGH WIND


  • FoxDev

    @flabdablet said:

    NONE OF YOU IS WORTH A FART IN A HIGH WIND

    :rofl:


Log in to reply