Zed Shaw gets schooled on C undefined behavior



  • I accidentally stumbled into this old drama while looking over Zed Shaw's new mid-level Python book project. For those who don't know, Zed Shaw is the author of excellent Learn Python the Hard Way free online book.

    Anyway, this is the tail end of a long drama surrounding it seems everything Zed Shaw does. But here's the gist:

    1.

    Zed Shaw heavily critiques the famous K&R C book: https://web.archive.org/web/20141205223016/http://c.learncodethehardway.org/book/krcritique.html

    He especially has problems with this unsafe copy() function:

    void copy(char to[], char from[])
    {
        int i;
    
        i = 0;
        while((to[i] = from[i]) != '\0')
            ++i;
    }
    

    Basically, he's advocating against NULL-terminated strings and for keeping string lengths in a separate variable.

    The only way to break safercopy() is to lie about the lengths of the strings, but even then it will still always terminate. The worst possible scenario for the safercopy() function is that you are given an erroneous length for one of the strings and that string does not have a '\0' properly, so the function buffer overflows.

    2.

    A commenter on HN points out that his safecopy() proposed function doesn't offer any greater guarantee it will terminate.

    Zed Shaw responds:

    I don't think you know what it means when someone asks whether a loop will terminate. It's an analysis of the logic of the loop to determine if it will exit. This kind of analysis has been the actual foundation of computer science since there was computer science. I believe this guy Turing was doing some stuff with it.

    However, you have a logical flaw in your statement. You say the function will not terminate if it is passed an invalid pointer, but processes terminate when they access invalid pointers. If that's true, then it will terminate. However, if processes do not terminate when passed invalid pointers, then this function will still terminate because it will hit the end of its length variable and terminate.

    So you're wrong on many counts.

    3.

    @mikeash issues a challenge.

    I take it that you're completely unfamiliar with the concept of "undefined behavior" in the context of C. You're not qualified to discuss the language at this level until you understand that. It's possible to invoke undefined behavior in your function. Undefined behavior can mean something never terminates. Thus, it's possible for your function to never terminate.

    If you don't believe me, I pose the following challenge: give me an implementation of this function. I will then provide a chunk of code that calls it and causes it to enter an infinite loop.

    Zed accepts:

    Alright my friend, here's the gist:

    The rules:

    1. You said you can make that for-loop run forever that can call it and it'll enter an infinite loop.
    1. To prove that, you can only alter the main function, then hand me back the code and I'll compile it and run it on my machines.
    2. It has to run without stopping for 24 hours. If you can do that then I'll consider that an "infinite loop".
    3. You can't call any more functions than what's in there already. So no fancy hacks to keep the OS from allowing segfaults by putting in signal handlers, linking against other libraries, or anything.

    Very curious how you do this. This is fun!

    4.

    Zed Shaw gets schooled.

    Here you go:

        int main(int argc, char *argv[])
        {
            int offset = -63;
            char input[] = { 1, 1, 1 };
            char *output = input + offset;
            safercopy(3, output, 3, input);
            
            return 0;
        }
    

    Zed acknowledges defeat, but can't help coming off as a petulant asshole all the same.

    Alright, I did finally get this working. Pretty fucking awesome, I had not thought of that. Here's a version everyone can try:

    I officially concede that because you can work two pointers on a computer to overwrite another location of memory to alter a for-loop (incidentally, there's not UB listed in ANSI for 'alter the variable of a for-loop') that everyone should go back to writing their C code just as K&R intended.

    Please, you all should rely on only the '\0' byte terminator of all strings, don't do any bounds checking, don't check the return code of functions, and you will be totally safe.

    Because, UB means "I ain't gotta fix it."

    Enjoy, now I'm going painting.

    Some time later he removes his deconstruction of K&R and publishes this: http://c.learncodethehardway.org/book/krcritique.html


    Anyway, that seems to be par for the course with this Zed Shaw guy. Looks like he's one of those people who think everyone else is an idiot and can't take even a little bit of critique.

    For example, check out this:

    http://zedshaw.com/2015/06/01/dear-paul/

    Where Zed Shaw is demanding that the owner of YCombinator shut down Hacker News due to supposed incessant toxic attacks against Zed Shaw.

    The "vicious attack" in question?

    Some guy's blog post critiquing Zed Shaw's book in the most academic way imaginable.

    Or how about this anti-rails rant?

    Interesting guy. Obviously talented in many ways. But at the same time, seems to be his own greatest enemy.



  • gets schooled

    With a disappointingly low level of SPANKing.

    Also, Pascal does string-length-storage, and I'm pretty sure it has its own can of worms related to that.


  • BINNED

    @cartman82 said:

    Or how about this anti-rails rant?

    I’ll add one more thing to the people reading this: I mean business when I say I’ll take anyone on who wants to fight me. You think you can take me, I’ll pay to rent a boxing ring and beat your fucking ass legally. Remember that I’ve studied enough martial arts to be deadly even though I’m old, and I don’t give a fuck if I kick your mother fucking ass or you kick mine. You don’t like what I’ve said, then write something in reply but fuck you if you think you’re gonna talk to me like you can hurt me.

    I didn't know Uwe Boll is a software developer when he's not making shitty movies...


  • FoxDev

    @Maciejasjmj said:

    Pascal does string-length-storage

    IIRC so do C# and Java, it allows them to do things like in place substrings (substrings don't take extra storage for the buffer, just for the object reference, offset into the buffer, and length (massive simplification here, but you get the idea))



  • Why the FUCK is anybody talking about ASCII strings in 2015? Jesus. "OMG! This thing that's been obsolete for 15 years and nobody should ever be doing EVER, here's how you do it." "No it ain't! HERE'S how you do this thing that's been obsolete for 15 years and nobody should be ever doing EVER." "Ok I concede you are better at solving useless task nobody should ever do ever."



  • You're writing code on some microcontroller, juggling 128K of RAM. You only have a simple LED display capable of showing English characters and letters. This is where C is used nowdays and where this sort of stuff is still relevant.

    Otherwise, I don't think either C++ or any higher level languages are still even using NULL terminated strings.



  • @accalia said:

    IIRC so do C# and Java, it allows them to do things like in place substrings (substrings don't take extra storage for the buffer, just for the object reference, offset into the buffer, and length (massive simplification here, but you get the idea))

    Just about every string class/built-in type that isn't brain-dead stores a length explicitly. I'm pretty sure that std::string is required to, for starters...

    @cartman82 said:

    Otherwise, I don't think either C++ or any higher level languages are still even using NULL terminated strings.

    About the only place they show up in C++ is for string constants. And those don't have manipulation problems.

    Also, NULL-terminated strings are not an ASCII-only concept, @blakeyrat. 🚎



  • @cartman82 said:

    You're writing code on some microcontroller, juggling 128K of RAM. You only have a simple LED display capable of showing English characters and letters. This is where C is used nowdays and where this sort of stuff is still relevant.

    Right; so there's like ... 7 people it's relevant to, and like 7 billion people it's not.

    @tarunik said:

    Also, NULL-terminated strings are not an ASCII-only concept, @blakeyrat

    Writing your OWN code to copy strings is a stupid concept, you're right that it's not an ASCII thing necessarily, but that doesn't change that it's still fucking idiotic.

    Are those two magnets having sex? What is that?



  • @blakeyrat said:

    Writing your OWN code to copy strings is a stupid concept, you're right that it's not an ASCII thing necessarily, but that doesn't change that it's still fucking idiotic.

    Agreed. If you decide to write your own string copy code, you're doing it wrong.

    But it's still a useful exercise in a programming book.



  • @cartman82 said:

    But it's still a useful exercise in a programming book.

    Only if it's surrounded by a million GIANT RED BANNERS saying "NEVER DO THIS EVER YOU STUPID FUCKERS JESUS CHRIST THIS IS LIKE EVERY SECURITY FLAW OF THE LAST 20 YEARS YOU DUMBSHITS STOP STOP STOP"

    And yes, there have to be a million of these banners.

    Let's give the people learning programming exercises that, if they copy them into production code, won't result in people's computers being compromised by viruses or their credit card being stolen by some Russian dude. Ok?



  • @blakeyrat said:

    Writing your OWN code to copy strings is a stupid concept, you're right that it's not an ASCII thing necessarily, but that doesn't change that it's still fucking idiotic.

    Agreed that it's stupid to write your own code to copy strings, or memory for the most part for that matter -- the only explicit byte-copy loops I've written lately were in the reset handlers of a Cortex-M setting up .data and .bss.

    @blakeyrat said:

    Are those two magnets having sex? What is that?

    Use the Tooltip, @blakeyrat!

    @blakeyrat said:

    Let's give the people learning programming exercises that, if they copy them into production code, won't result in people's computers being compromised by viruses or their credit card being stolen by some Russian dude. Ok?

    QFT!



  • @tarunik said:

    About the only place they show up in C++ is for string constants. And those don't have manipulation problems.

    Well, std::string keeps it's data null-terminated. the c_str() method requires a const pointer to a null terminated string, so it's easiest to just make that return the data pointer rather than copy the contents and add a null terminator. I think it wasn't required before, but the new standards mandate it, I'd have to check to be certain.

    Also, COM binary strings (BSTR) are a pointer to a null terminated string that also has a four byte length header. Learned that this week.

    length | string | null terminator
           ^
           pointer points here


  • @tarunik said:

    Use the Tooltip, @blakeyrat!

    I prefer to interpret the image as presented.

    I'm going with the "magnets having sex" thing.



  • @Kian said:

    Well, std::string keeps it's data null-terminated. the c_str() method requires a const pointer to a null terminated string, so it's easiest to just make that return the data pointer rather than copy the contents and add a null terminator. I think it wasn't required before, but the new standards mandate it, I'd have to check to be certain.

    It may keep its data null terminated -- but that's an implementation detail, basically. I'm pretty sure the standard also requires std::string::length() to be constant-time, which means you must keep a length member. I find the approach taken by BSTR to be rather interesting as well, though -- stashing the length before the pointer's rather cute, as long as you aren't a total frob with pointers, that is ;)



  • @tarunik said:

    It may keep its data null terminated -- but that's an implementation detail, basically.

    If it's in the standard (which I'm pretty sure it is in now), it's not an implementation detail. But yeah, it doesn't rely on that to know the length. It's basically an inter-operability consideration. So you can pass the data directly to functions that want a null terminated string. So every c function.


  • ♿ (Parody)

    @blakeyrat said:

    I prefer to interpret the image as presentedeverything incorrectly.

    FTFY



  • Even with 8KB of RAM, I still think it's totally worth it to spend a couple extra bytes per string to store its length. C is just stupid about that.


  • Discourse touched me in a no-no place

    @Kian said:

    Also, COM binary strings (BSTR) are a pointer to a null terminated string that also has a four byte length header.

    IIRC a BSTR's null terminator is more belt-and-suspenders; the length prefix is the canonical length determinator. The string itself isn't considered a null-terminated string, as you are allowed to have 0 byte in it (for example, you can have x86 binary code in a BSTR.)


  • Discourse touched me in a no-no place

    @blakeyrat said:

    I prefer to interpret the image as presented.

    You don't consider the name of the icon or the tooltip part of the presentation? You don't type :magnets_having_sex: to display it, after all.


  • Discourse touched me in a no-no place

    @tarunik said:

    stashing the length before the pointer's rather cute, as long as you aren't a total frob with pointers

    To properly use a BSTR, you need to use the allocation API (e.g., SysAllocString as opposed to malloc()), which keeps track of that. The implementation allows you to use a BSTR where you'd use a wide character string/OLESTR without converting, which is why I think it works that way.


  • Discourse touched me in a no-no place

    @anonymous234 said:

    Even with 8KB of RAM, I still think it's totally worth it to spend a couple extra bytes per string to store its length. C is just stupid about that.

    C was written in a time when 8KB of RAM might have been considered comfortable. The Atari 2600 had, IIRC, 128 bytes.


  • ♿ (Parody)

    @FrostCat said:

    To properly use a BSTR, you need to use the allocation API (e.g., SysAllocString as opposed to malloc()), which keeps track of that.

    It also does some reference counting, IIRC.



  • He didn't realize that C++ pointers give you the ability to point anything at anything, like a gun and your foot.

    There is a reason that C# makes you mark your code "unsafe" to use them.

    I've also noticed that he does most of his posting in venues that don't allow comments.

    I suppose bad programmers need their safe spaces too.



  • @FrostCat said:

    The string itself isn't considered a null-terminated string, as you are allowed to have 0 byte in it (for example, you can have x86 binary code in a BSTR.)
    Here's the reference page https://msdn.microsoft.com/en-us/library/windows/desktop/ms221069(v=vs.85).aspx

    The string data is "A string of Unicode characters. May contain multiple embedded null characters." (Keeping in mind that for Microsoft, "Unicode characters" is UTF16 encoding). But the terminator is still two null characters. So it's not required to hold a single c-string, but it is a string with a null terminator, and you can't just put anything in there.

    @boomzilla said:

    It also does some reference counting, IIRC.
    Actually, they don't: https://msdn.microsoft.com/en-us/library/xda6xzx7.aspx "When a BSTR stays within an interface, you must free its memory when you are done with it. However, when a BSTR passes out of an interface, the receiving object takes responsibility for its memory management." If you're calling from a managed language and receiving the string, the language will convert the BSTR into it's own native type and handle it for you. If you are calling from C++, for example, you need to clean it up yourself.

    EDIT - Extra fun tips: Microsoft document writers will use "one null character" and "two null characters" interchangeably to mean the same thing.



  • I hate memory management in native languages through interfaces for this very reason.
    Who bites the deallocation bullet? How do you consistently control that?

    Ideally you could provide a callback, and then let the creator deallocate. But, then you have a very bad path of execution, and that restricts you from doing anything fluidly. The only option is to then deep copy it, and let the callback owner deallocate the deep copy. That's terribly complicated, but does allow for better memory management. And it preserves stack deallocation.

    But I also hate parts of managed memory management too.
    Like, when can I be sure that it is deallocated?
    This is why I make heavy use of using statements when my managed memory is using native data.


  • 🚽 Regular

    @anonymous234 said:

    Even with 8KB of RAM, I still think it's totally worth it to spend a couple extra bytes per string to store its length. C is just stupid about that.

    On embedded not really. I frequently have to restructure the living daylights out of things to cram them into the processor. You want to use absolute smallest part you can to keep the margins up. Seeing 99% RAM/ROM used on the linker is normal.

    I suspect we will be using higher level languages soon though. When I started over a decade ago everything was in assembly. Now I only rarely use assembly (cycle-length critical stuff) and C is the norm.


  • ♿ (Parody)

    @Kian said:

    Actually, they don't:

    Huh...I could have sworn I remembered them doing something like that.



  • @xaade said:

    I hate memory management in native languages through interfaces for this very reason.Who bites the deallocation bullet? How do you consistently control that?

    Yeah, relying on good documentation and everyone doing things correctly is less than ideal.


  • Discourse touched me in a no-no place

    @Kian said:

    But the terminator is still two null characters.

    A string that can contain arbitrary bytes cannot be considered null-terminated. I present to you, by way of example, a string consisting of ten 0 bytes and a soi-disant 2-byte null terminator. No function that relies on a null terminator will be able to operate on the string.

    You gotta have the bytes, but they're not the canonical length determinant; they're a convenience feature.



  • I prefer to have the owner deallocate, but sometimes that makes the situation complicated.

    Relying on good documentation and everyone doing things correctly, is less than ideal. But it's the best option for the cost.

    Owner deallocation preserves stack deallocation which automatically takes advantage of RAII, which is the ideal. But sometimes you can't do that without making the execution path overly complex, or creating hacks.

    Which means there's no real ideal. There's only a best fit.

    Passing out an object, designed using RAII, through an interface with a pointer, breaks RAII, in the sense that you have to explicitly delete the pointer in the end. Yeah, the deallocation is easier, but you lose the deterministic advantage. You now rely on a delete statement to provide the determined point of deallocation.

    If, instead the caller provides a callback to use or consume the resource, then you preserve RAII, and the owner deallocates. But that means an extra step for the caller.

    But why are you passing objects through an interface anyway?

    Well, strings (and in the greater sense, containers) are the one exception really.

    At which point, make the caller create the container, so the caller can deallocate on their own stack.


    Generally speaking, wrap heap allocation in an object that performs RAII, and do not have sides of an interface share the responsibility of initialization and deallocation.

    Either the caller provides a callback, or the caller provides the instance of the object by reference.

    That is my ideal.



  • @FrostCat said:

    soi-disant

    whut?

    @FrostCat said:

    I present to you, by way of example, a string consisting of ten 0 bytes and a soi-disant 2-byte null terminator. No function that relies on a null terminator will be able to operate on the string.
    Sure they can. For instance, wstrlen will return, correctly, 0. For a function that relies on a null terminator, the string length is simply shorter than the buffer length. The null terminator allows the string to work with both kinds of functions, those that require a null terminator and those that require a length. The terminator is not just a decoration, it's an interoperability requirement.



  • @Kian said:

    Well, std::string keeps it's data null-terminated. the c_str() method requires a const pointer to a null terminated string, so it's easiest to just make that return the data pointer rather than copy the contents and add a null terminator.
    This is true, but it somewhat misses the point. std::string is also required to (1) have a O(1) size function (as @tarunik says) and (2) store and operate correctly on strings with embedded NULs. Both of these means that it has to be a counted string.



  • @Kian said:

    The null terminator allows the string to work with both kinds of functions, those that require a null terminator and those that require a length.
    Only if you define "string" in the C sense of "a bunch of non-NUL bytes followed by a NUL." If you define it as "a string of bytes", as counted strings often do, then no, terminated-string functions won't work correctly on it.


  • Discourse touched me in a no-no place

    @Kian said:

    whut?

    It's a perfectly cromulent word: http://dictionary.reference.com/browse/soi-disant?s=t


  • Discourse touched me in a no-no place

    @Kian said:

    For instance, wstrlen will return, correctly, 0.

    Fine. My next example is the string consisting of the 5 UTF-16 characters a, b, null, d, e. How long will wcslen say that is? IOW, what's the output of this:

    [code]#pragma comment(lib, "oleaut32.lib")

    #define UNICODE
    #define STRICT

    #include <windows.h>
    #include <oleauto.h>

    void main()
    {
    BSTR b;
    wchar_t x[5];
    wcscpy(x, L"abcde");
    b = SysAllocString(L"abcde");

    printf("bstr is %d characters\n", SysStringLen(b));
    printf(" str is %d characters\n", wcslen(x));

    x[3] = 0;
    b[3] = 0;

    printf("bstr is %d characters\n", SysStringLen(b));
    printf(" str is %d characters\n", wcslen(x));

    }[/code]



  • @FrostCat said:

    You don't type :magnets_having_sex: to display it, after all.

    Sounds like a Discourse bug, since that's obviously what the image is.



  • @EvanED said:

    Only if you define "string" in the C sense of "a bunch of non-NUL bytes followed by a NUL." If you define it as "a string of bytes", as counted strings often do, then no, terminated-string functions won't work correctly on it.

    You got that exactly backwards. First, I'm obviously defining string as any arbitrary array of bytes. That's why I mention the null terminator separately.

    Your confusion stems from your belief that having a function that expects a null terminated string operate on only a small portion of your buffer is a mistake. This is not so. The behavior and use of functions expecting null terminated strings is well specified: they operate on the pointed string until they find a null. If the null is the first character, the string is effectively empty, which is valid for null terminated strings. A null in any other position simply means a string of that length.

    If you want to operate on your string as an arbitrary buffer, and not as a c-string, you call functions that operate on it as an arbitrary buffer. If you want to copy the whole buffer, call a function that copies the full buffer, not one that copies the buffer until the first null.

    However, let's say you have a counted string. Counted strings are not required to have a null anywhere in them. So if you pass a counted string to a function that expects a null terminator, the function will continue to process the string past it's end. If your string is both counted AND has a null terminator, however, you can trust that if you pass it to a function that requires a null terminator, it will do the right thing.

    See, the problem isn't with arbitrary strings having nulls inside them. The problem is arbitrary strings NOT having a null in them. Which is why you ensure that every string will be null terminated, by mandating a null terminator at the end.



  • @FrostCat said:

    Fine. My next example is the string consisting of the 5 UTF-16 characters a, b, null, d, e. How long will wcslen say that is?

    2, of course. Why would you use a function that gives you the length of the first c-string pointed to, to get the length of the full buffer? Call the function that gives you the information you need.

    If you call "abort()" your program will crash, that's not the fault of the string either. It's you calling the wrong function for what you want to know.


  • Discourse touched me in a no-no place

    @Kian said:

    If your string is both counted AND has a null terminator, however, you can trust that if you pass it to a function that requires a null terminator, it will do the right thing.

    Obviously, if the function doesn't know about the count, this is untrue. See above. QED.


  • Discourse touched me in a no-no place

    @Kian said:

    2, of course. Why would you use a function that gives you the length of the first c-string pointed to, to get the length of the full buffer?

    We were talking about the string length, not "the length of the full buffer". Quit :moving_goal_post:.

    I said the null terminator in a BSTR is a convenience, and not actually a null terminator the way we think of it. You must not have run my program, because the last two lines are 2 and 5, because SysStringLen properly uses the count, not the "terminator", to tell how long the string is.



  • @FrostCat said:

    We were talking about the string length, not "the length of the full buffer". Quit .

    If you hold up three fingers, and ask "how many fingers am I holding up?", the answer is three. Not five, even though you have five fingers in your hand. Calling wcslen is asking "how many characters are there before the first null?" The correct answer is 2. If you want to know the length of the buffer, then sure, check the count.

    But imagine your buffer was {'a','b','c','d'}, and SysAlloc didn't add a null terminator. Then SysStringLength would give you 4, and wcslen would give you a seg fault.

    Since there's a lot of code that expects naked char*, making sure that all your strings are null terminated ensures that no matter who you give your char* to, they won't run into a seg fault by mistake.


  • FoxDev

    @Kian said:

    Not five, even though you have five fingers in your hand.

    Thumb != finger 😛



  • @tarunik said:

    It may keep its data null terminated -- but that's an implementation detail, basically. I'm pretty sure the standard also requires std::string::length() to be constant-time, which means you must keep a length member. I find the approach taken by BSTR to be rather interesting as well, though -- stashing the length before the pointer's rather cute, as long as you aren't a total frob with pointers, that is ;)

    I found it interesting that the underlying problem even has its own name: Shlemiel the painter algorithm.



  • @Kian said:

    First, I'm obviously defining string as any arbitrary array of bytes.
    Saying this and then saying "string functions operate until the first NUL" is certainly pretty close to being inconsistent.

    @Kian said:

    The behavior and use of functions expecting null terminated strings is well specified: they operate on the pointed string until they find a null.
    But the only reason that behavior makes any kind of sense is because you're basically not defining strings as being an arbitrary sequence of bytes, but as the portion of memory up to the first NUL byte. And my assertion is that the only reason that behavior and definition makes sense is because you're taking a very C-centric view of the definition of "string". If I have a std::string s1, s2 where the strings have embedded nulls and say s1 + s2, it will concantenate the whole thing, not just some weird Frankenstein thing with half of s1 and half of s2.

    I'm pretty sure that if you weight by language popularity, "strings" can have embedded nulls and operate over the whole object almost all of the time. It's true of C++ std::string, it's true in JavaScript, it's true in Python, it's true in Perl, it's true in Ruby, it's true in Java, it's true in Lua, it's true in Go, it's true in PHP. That's literally all the languages that I tried.

    @Kian said:

    See, the problem isn't with arbitrary strings having nulls inside them. The problem is arbitrary strings NOT having a null in them. Which is why you ensure that every string will be null terminated, by mandating a null terminator at the end.
    I never said that functions that expect a terminator will work correctly with counted strings that don't "artificially" append one. They won't. But they also won't work with strings with embedded NULs, unless you define "string" in a (IMO) pretty crappy way.


  • ♿ (Parody)

    @EvanED said:

    Saying this and then saying "string functions operate until the first NUL" is certainly pretty close to being inconsistent.

    He didn't say "string functions." He referred specifically to the function that @FrostCat mentioned.



  • @tarunik said:

    I'm pretty sure the standard also requires std::string::length() to be constant-time, which means you must keep a length member. I find the approach taken by BSTR to be rather interesting as well, though -- stashing the length before the pointer's rather cute
    I don't know what MSVC's library does, but that's actually basically the representation of libstdc++'s string as well. A string object itself is just a pointer; it points to the start of the actual string, but preceding that is a struct with the length of the string, the current refcount (for copy-on-write purposes; this may go away with the recent ABI change, I'm not sure what the new representation is), and something else that I forget.



  • @EvanED said:

    But the only reason that behavior makes any kind of sense is because you're basically not defining strings as being an arbitrary sequence of bytes, but as the portion of memory up to the first NUL byte

    Said another way, at least one of the following must be true:

    1. strlen does not find the length of a string
    2. strings are not arbitrary sequences of bytes

  • ♿ (Parody)

    @EvanED said:

    strings are not arbitrary sequences of bytes

    Yes, he explicitly said that this was the case for the function in question.



  • @EvanED said:

    But the only reason that behavior makes any kind of sense is because you're basically not defining strings as being an arbitrary sequence of bytes, but as the portion of memory up to the first NUL byte. And my assertion is that the only reason that behavior and definition makes sense is because you're taking a very C-centric view of the definition of "string".

    No, it makes sense because functions that operate on strings up to the first null are things that exist, and that are available to every language that can call C functions. Which is anything above the most limited scripting language.

    So when you call a function that specifically operates until the first null, the first null is what determines how far into the string the function will operate. It's a tautology, it's not that hard to grasp. It doesn't mean the function is doing the wrong thing. It's simply wrong to call it if you didn't want that behavior.



  • Strictly speaking, in C strlen just tells you the position of the NUL byte.


Log in to reply