Injecting '\0' into STL strings



  • This rather strange "feature" of STL strings was spotted on Microsoft Visual C++ 6. In short, we'd expect that, by running the code snippet below:

    char cString[] = "\0";

    string cut = string() + "This STL string should not be cut at this (" + string(cString, 4) + ") point";

    cout << cut << endl;


    string safe(cString);

    string notCut = string() + "This STL string should not be cut at this (" + safe.substr(0, 4) + ") point";

    cout << notCut << endl;

    We'd get the following printed at the standard output:

    This STL string should not be cut at this () point

    This STL string should not be cut at this () point

    Instead, we get:

    This STL string should not be cut at this (

    This STL string should not be cut at this () point

    What is the matter here? My guess is that, in the first case, the '\0' character (which is used to mark the end of a C string) gets copied straight to the STL string's character buffer. Anything that gets written past it will be forever outside the reach of any code that watches for end-of-string markers -- particulary, IO functions.

    In any case, the second case illustrates a way to avoid this "feature".



  • @xperroni said:

    string(cString, 4)
    It seems obvious to me that this constructor is reading in the first four addresses of the array cString, the first of which is '\0'. I bet string(cString) works the way you expect. It's not STL's fault that you're using the wrong constructor. In before mug.



  • Welbog's right. By the way, VC++ 6 is nine years old. I see no reason for I/O functions to skip the NUL character. For the first snippet I got:

    This STL string should not be cut at this (  @ ) point
    


  • @Spectre said:

    For the first snippet I got:
    This STL string should not be cut at this (  @ ) point
    It's replacing nul with space? That's weird.



  • It's not space; <font face="Courier">od</font> tells me it is hex <font face="Courier">00 00 40 00</font> — i.e. NUL with random garbage, as expected. And NUL looks suspiciously not unlike space. 8=]

    (The real WTF is that TT HTML element doesn't work in this forum.)



  • Try this:

    string ok = string() + "This STL string should not be cut at this (" + string(4, 0) + ") point";

    (The constructor means, create a string of four characters with value 0.)

    Your problems probably start from misunderstanding C-style strings: cString is not a string containing one null-character. You can't put a null-character inside a C-style string because it marks the end of the char array. The std::string constructors see this array as a zero-length string.

     Seems that std::string is a bit fragile as far as it has to cope with C-style strings. :(



  • char cString[] = "\0";
    This is the same as declaring char cString[] = {0x00, 0x00};

    std::string cut = "blah (" + std::string(cString, 4) + ") blah";
    Creates a std::string using four characters from your character array cString. Since cString doesn't contain the required four characters it will copy the first two 0x00's and whatever might be in memory after it, eg. garbage. Concatenating the strings will not be affected by the string terminators but by the internal size counter of the string (=4), it's a class remember. So what you have now is "blah (\0\0\0\0) blah"

    std::cout << cut << std::endl;
    std::cout will try to output the string but it will stop at the first string termination character, eg after the first opening paranthesis

    std::string safe(cString);
    Creates a new std::string from your character array. What it will do is copy every character up until the first string terminator it can find. So, basically, your string is empty. It would be the same as declaring std::string safe = "";

    std::string notCut = "blah (" + safe.substr(0, 4) + ") blah";
    This will take a substring from your std::string, but since it's a string class and not a dumb char array it will be smart enough (iterators) not to exceed the bounds of the string. So, safe.substr(0, 4) on an empty string is.. an empty string.

    std::cout << notCut << std::endl;
    This will print blah + empty string + blah

     

    So if you didn't want it to be cut off you shouldn't have explicitly told it to copy four characters out of cString (eg. abusing it, and even making it go out of range). It's all normal expected behavior to me. rtfm.
     



  • The real WTF is they you try to use STL in VC6. Which does not implement the real STL standard (as when VC6 was made the STL standard was not done yet)



  • Use an up-to-date Visual C++ Express edition already. It's free, and has a hell of a lot better debugger than VC6.



  • @GuntherVB said:

    char cString[] = "\0";
    This is the same as declaring char cString[] = {0x00, 0x00};

    Right.

    @GuntherVB said:

    std::string cut = "blah (" + std::string(cString, 4) + ") blah";
    Creates a std::string using four characters from your character array cString. Since cString doesn't contain the required four characters it will copy the first two 0x00's and whatever might be in memory after it, eg. garbage.

    Actually, no – or at least, it shouldn't. Just as substr() is smart enough to stop at the end of a string that is shorter than the required length, so should string(const char*, size_t) – it can do so quite simply, just checking for the '\0' as it reads the input buffer. But perhaps you're on into something here: the constructor might take that first 0x00 value as an actual character, not the end marker of an empty C-string. Then we end up with an STL string "version" of the end-of-string marker. That would be a bit misleading (because debuggers often show both '\0' and '' characters as the empty string) but not technically incorrect.

    @GuntherVB said:

    So if you didn't want it to be cut off you shouldn't have explicitly told it to copy four characters out of cString (eg. abusing it, and even making it go out of range).


    The very point of using STL strings is relying on them to manage character buffers – which means mostly preventing overruns. If the very C string manipulation routines are smart enough to check for the end-of-string marker, why shouldn't the higher-level std::string? The only problem with my original code was assuming that the string constructor would take a '\0' for a '', which it obviously doesn't. In any case, the second part of the snippet demonstrates how to avoid that feature, while still relying on std::string to do the dirty buffer-management work.



  • @Thief^ said:

    Use an up-to-date Visual C++ Express edition already. It's free, and has a hell of a lot better debugger than VC6.

    I would if I could, pal. It's not like we've never heard of legacy systems.
     



  • Are you complaining about not being able to run the new VC++ itself on your pcs, or the executables it produces?



  • Stupid edit timeout.

    For Clarification, VC 2005 will only run on Windows 2000 or later, and the executables it produces only run on Windows 98 and later. However, it does have a x86-64 compiler and debugger, along with a lot of mobile platforms.

    There are also apparently sometimes known problems migrating projects from VC6 because of changes to MFC.



  • @xperroni said:

    @GuntherVB said:

    std::string cut = "blah (" + std::string(cString, 4) + ") blah";
    Creates a std::string using four characters from your character array cString. Since cString doesn't contain the required four characters it will copy the first two 0x00's and whatever might be in memory after it, eg. garbage.

    Actually, no – or at least, it shouldn't. Just as substr() is smart enough to stop at the end of a string that is shorter than the required length, so should string(const char*, size_t) – it can do so quite simply, just checking for the '\0' as it reads the input buffer. But perhaps you're on into something here: the constructor might take that first 0x00 value as an actual character, not the end marker of an empty C-string. Then we end up with an STL string "version" of the end-of-string marker. That would be a bit misleading (because debuggers often show both '\0' and '' characters as the empty string) but not technically incorrect.

    Actually yes, it should:

    basic_string(const charT* s, size_type n, const Allocator& a = Allocator());
    

    Requires: s shall not be a null pointer and n < npos.

    Effects: Constructs an object of class <font face="Courier">basic_string</font> and determines its initial string value from the array of <font face="Courier">charT</font> of length n whose first element is designated by s, as indicated in Table 40.

    (Table skipped.)

    See? No terminators, predators, or last action heroes are mentioned. RTFS (Read The Friendly Stroustrup).

    @GuntherVB said:

    std::cout will try to output the string but it will stop at the first string termination character, eg after the first opening paranthesis

    Again, I see no reason for it. Why shouldn't you be able to output NULs? I compiled the snipper with GCC, and it outputs four characters too.

    I stand corrected, though — there are only two garbage characters, not three.



  • @Spectre said:

    @xperroni said:

    @GuntherVB said:

    std::string cut = "blah (" + std::string(cString, 4) + ") blah";
    Creates a std::string using four characters from your character array cString. Since cString doesn't contain the required four characters it will copy the first two 0x00's and whatever might be in memory after it, eg. garbage.

    Actually, no – or at least, it shouldn't. Just as substr() is smart enough to stop at the end of a string that is shorter than the required length, so should string(const char*, size_t) – it can do so quite simply, just checking for the '\0' as it reads the input buffer. But perhaps you're on into something here: the constructor might take that first 0x00 value as an actual character, not the end marker of an empty C-string. Then we end up with an STL string "version" of the end-of-string marker. That would be a bit misleading (because debuggers often show both '\0' and '' characters as the empty string) but not technically incorrect.

    Actually yes, it should:

    basic_string(const charT* s, size_type n, const Allocator& a = Allocator());

    Requires: s shall not be a null pointer and n < npos.

    Effects: Constructs an object of class <font face="Courier">basic_string</font> and determines its initial string value from the array of <font face="Courier">charT</font> of length n whose first element is designated by s, as indicated in Table 40.

    Is that from an online reference? Could you send the link?

    @Spectre said:


    See? No terminators, predators, or last action heroes are mentioned. RTFS (Read The Friendly Stroustrup).

    So it was naive of me to assume that all std::string methods would check for '\0' characters while processing C-string arguments. But I do find it a bit misleading, if not incorrect in itself.

    @Spectre said:


    @GuntherVB said:

    std::cout will try to output the string but it will stop at the first string termination character, eg after the first opening paranthesis

    Again, I see no reason for it. Why shouldn't you be able to output NULs?

    That was never the point. The point was that, in the first part of the snippet, it seemed the string should go further than what was actually printed. It didn't because IO (as would any code that checks for end markers) ran into a '\0' and correctly stopped right there; the problem was that '\0' should have never gotten into the middle of the string in the first place. My reasoning for why it got there (and whether this behaviour was correct) may have been incomplete, but the second part of the snippet correctly avoids the problem, providing a way to concatenate STL strings and C-strings, even when some of the C-strings have starting '\0' characters.



  • string ( const char * s, size_t n );
  • Content is initialized to a copy of the string formed by the first n characters in the array of characters pointed by s.
  • string ( const char * s );
  • Content is initialized to a copy of the string formed by the null-terminated character sequence (C string) pointed by s. The length of the caracter sequence is determined by the first occurrence of a null character (as determined by traits.length(s)). This version can be used to initialize a string object using a string literal constant.
  •  
  • ONLY that single constructor looks for a string termination character, std::string itself does not use string terminators at all. That constructor is the reason why we can assign a character array to a string (because string s = "foobar"; is the same as string s = string("foobar"); again calling the constructor that knows about the old C string terminator). So, you told it to go past the the null terminator. It's not a substring function, it's an initializer. If you wanted substring you should use the substr function string(cString).substr(0, 4). This will create a temporary object through the constructor and take a correct substring.

    And indeed I was wrong, cout doesn't care about the null inside the string as should be expected and simply outputs the entire string including the four characters. Proving once more that std::string doesn't use string terminators, but iterators when looping over its elements.

    Knowing all this it's a bit weird that your sentence is cut off at the first paranthesis.. but you never know what garbage (binary) characters crept into your string and how the console will react to them. 

    Everything's logical in C++
     

     

     


     

     


  • I was from the C++ standard; see [url=http://www.kuzbass.ru/docs/isocpp/lib-strings.html]this[/url], 21.3.1[6,8].



  • If you absolutely must use Visual C++ 6.0, give STLport a serious look.


Log in to reply