How not to copy a zero



  • I inherited an app to maintain and clean up, after a previous external team managed to get it to a "we don't want to keep sinking resources into this" state. The code is a mixed bag. Some parts are reasonable, but other parts are a mess. If I had to guess, they had a couple of guys that knew what they were doing that got it up to 80%, and then the last 20% (which, as we all know, takes 80% of the time) was left to "less experienced" coders.

    Getting it to compile in VS at /W4 and treating warnings as errors was pretty simple, thankfully. I then decided to run analyze on it to see what it could find. I wanted to share what I think was the best piece of code I found. I should warn you I'm a C++ programmer, in case that's a trigger for any of you.

    So, analyze tells me some uninitialized memory is being used. I check, and I find that a function that takes a buffer and copies it into a bit of memory was being given an uninitialized buffer to copy. How was this buffer being created?

    Status SomeFunction()  
    {
      char* zeroBuf = new char[1];
      return FunctionThatCopiesBuffer(zeroBuf,1);
    }
    

    So at least three errors jump out of that one line. To start with, as the static analyzer said, the buffer is not initialized. In C++, if you just new an array, it's not initialized. You need to ask for it to be initialized, like so:

    char* zeroBuf1 = new char[1](); // C++03, all zeroes
    char* zeroBuf2 = new char[1]{}; // C++11, allows you to specify values, any not specified are zeroed
    

    The second error, slightly less obvious, is that this leaks 1 byte every time it is called, because the array is never deleted. So it would have to look like:

    Status SomeFunction()  
    {
      char* zeroBuf = new char[1]{};
      Status s = FunctionThatCopiesBuffer(zeroBuf,1);
      delete[] zeroBuf;
      return s;
    }
    

    The third error, which is more of a style issue than an outright error, is that arrays of size 1 are also known as variables. The [] syntax is unnecessary.

    Status SomeFunction()  
    {
      char* zeroBuf = new char{};
      Status s = FunctionThatCopiesBuffer(zeroBuf,1);
      delete zeroBuf;
      return s;
    }
    

    The fourth error is that you don't need to dynamically allocate variables when you know how many you need and what the value you need is. What I changed the original code into was:

    Status SomeFunction()  
    {
      char zeroBuf = 0;
      return FunctionThatCopiesBuffer(&zeroBuf,1);
    }
    

    So instead of dynamically allocating a byte to get a pointer, I just gave the function the address of my automatically allocated "buffer" of one zeroed byte.

    Not all of the code is this bad, but it does have in many parts this feel of trying to do things the most convoluted way possible.



  • and in case that FunctionThatCopiesBuffer leaks the pointer of its first argument, it had "worked" before (as it leaked pointers to allocated heap memory) and now will leak random pointers to the stack and probably cause interesting memory hazards.

    But I assume you have made sure that this does not happen before changing the code. :)

    (Yes, I've been bitten by things like that before. That's why I dont' really like C/C++ or other memory unsafe languages and really hope that Rust will be a success.)

    BTW: I doubt that it really only leaked one byte - unless you are on an embedded system with a very good allocator, you will probably leak at least 3*sizeof(void*) bytes (one void* for storing the actual byte and its length (if you are lucky and they fit both), and at least two pointers for maintaining the chain or list of allocated/free blocks).



  • Yes, I checked what it did with the pointer first. The function just has a memcpy inside, didn't take ownership or pass it along. And yes, it probably leaked a lot more memory than just 1 byte, since it probably doesn't allocate single bytes, even without all the bookkeeping.


  • Discourse touched me in a no-no place

    @Kian said:

    a trigger

    I'm triggered by trigger warnings, you insensitive clod!



  • If you find yourself allocating and freeing the same memory within the same function, you're @CodingHorrorBot


  • 🔀

    @ben_lubar Is Doing It Wrongâ„¢



  • @ben_lubar said:

    allocating and freeing the same memory within the same function

    Unless you need, say, a 1GB temporary buffer and you're worried you might not be able to fit that on the stack.

    Filed under: Doing it Wrong, only differently.


  • Discourse touched me in a no-no place

    @tar said:

    Unless you need, say, a 1GB temporary buffer and you're worried you might not be able to fit that on the stack.

    One byte, one gigabyte. Not much difference there.



  • Metric gigabytes or programmers' gigabytes? It matters.


  • Discourse touched me in a no-no place

    @ben_lubar said:

    It matters.

    Not according to my Fermi estimation, it doesn't!




  • kills Dumbledore

    Don't forget the sequels to Ocean's Ten, Ocean's Ten and Ocean's Ten. I also liked 1000: Odyssey 1



  • And instead of a memcpy for the caller, isn't there some memset function to just set the data to 0?
    Did the caller really want a zero or just copy garbage to simulate random data (result of a coin toss)?
    Now the program is always heads.


  • Java Dev

    Would be an awefully bad source of random.



  • I think the intro in the OP mentioned this was a WTF, so an awful source of random would be a minor wtf.



  • How not to copy a Zero:

    1. Don't do Pearl Harbor
    2. Don't kamikaze yourself


  • @Kian said:

    Status SomeFunction()
    {
    char zeroBuf = 0;
    return FunctionThatCopiesBuffer(&zeroBuf,1);
    }</pre

    Why is that better than

    Status SomeFunction()  
    {
      return FunctionThatCopiesBuffer("", 1);
    }</pre


  • In technical terms, it isn't. They're equivalent. I prefer my version for readability, however.

    The called function, if I recall correctly, referred to the parameter as "buffer", not "string", which implies this function isn't expecting a null terminated string. I admit, I didn't spend enough time on it to figure out what they were trying to do. They're probably doing something dumb, I just wanted to make sure they were doing something dumb properly, so if it's a bug it will at least be a predictable bug I can fix later. I prefer a program that always fails to one that sometimes works.

    Also, it's a windows app in "Unicode" mode, so it uses wchars for strings. Seeing a "" instead of an L"" would raise a red flag for anyone seeing the code, and it would probably take them a bit to figure out what the code is trying to do. My version is self documenting.



  • For your next application, please utf-8 everywhere.



  • Couple of disagreements:

    Windows C++ programmers are educated that Unicode must be done with ‘widechars’. As a result of this mess, they are now among the most confused ones about what is the right thing to do about text.
    I'm not confused, thank you, I just have to deal with a shitty API. Also, replace "the right thing to do" with "my opinion".
    At the same time, in the Linux and the Web worlds, there is a silent agreement that UTF-8 is the most correct encoding for Unicode on the planet Earth.
    What does "most correct" mean? Something is either correct, or it isn't. You may measure other factors to define your preference, like ease of transmission and storage, but all three encodings (UTF8,16 and 32) are equally correct.

    Setting that aside, I don't get to control how my company handles unicode. For my own projects sure, I prefer UTF-8 to communicate with the outside world. Although that site comes off as incredibly patronizing.


  • FoxDev

    Windows C++ programmers are educated that Unicode must be done with ‘widechars’. As a result of this mess, they are now among the most confused ones about what is the right thing to do about text.

    Really? Coulda sworn they're encouraged to use TCHAR, and let the compiler sort out whether they're wide or narrow…


  • Discourse touched me in a no-no place

    @Kian said:

    What does "most correct" mean? Something is either correct, or it isn't.

    There's mere correctness, and then there's also being good style.

    When it comes to Unicode, there's a whole level of WTFcomplexity beyond UTF-8/16/32, namely the normalization rules. If you've never heard of NFC and NFD before, be glad that you've not blown braincells on such matters.



  • @Kian said:

    What does "most correct" mean?

    Given:

    • 2 + 2 = 5
    • 2 + 2 = 6
    • 2 + 2 = 7

    The first of these is the "most correct".



  • {deleted} I keep forgetting how to post.



  • @Kian said:

    So instead of dynamically allocating a byte to get a pointer, I just gave the function the address of my automatically allocated "buffer" of one zeroed byte

    But then it's not a buffer, and your method name is wrong.

    Obviously the coder was trying to ensure the integrity of the method name.

    @Kian said:

    which implies this function isn't expecting a null terminated string

    Because if C++ ever decides to specify a different meaning for "" other than null-terminated string, you're fucked up without a paddle.



  • @xaade said:

    {deleted} I keep forgetting how to post.

    ...and how to delete?


  • Discourse touched me in a no-no place

    @xaade said:

    Because if C++ ever decides to specify a different meaning for "" other than null-terminated string, you're fucked up without a paddle.

    If that happens, there'll be a very large amount of code broken. I think there's more important stuff to panic over.



  • @dkf said:

    When it comes to Unicode, there's a whole level of WTFcomplexity beyond UTF-8/16/32

    Which makes the articlerantmanifesto's suggestion to keep utf-8 in a std::string seem pretty questionable to me: wouldn't it make far more sense to use an actual unicode library (if such a thing exists in your environment) and store strings in a unicode::string, or whatever, where you can be rest assured that all operations act on actual unicode codepoints, and there's never any confusion about the encoding?



  • When you actually need to act on codepoints, you can expand it to a 4-byte codepoint array/buffer/whatever, then when you're done you pack it back into a utf-8 string to save on memory and make it easier to output/work with.

    []rune



  • @tar said:

    The first of these is the "most correct".

    So, "most correct" means "least wrong", which implies wrongness, which means that "most correct" does not imply actual correctness. Interesting.

    @xaade said:

    But then it's not a buffer, and your method name is wrong.
    How is it not a buffer? What is your definition of buffer? It's a one byte buffer, used to store a zero for another function to consume.

    @xaade said:

    Because if C++ ever decides to specify a different meaning for "" other than null-terminated string, you're fucked up without a paddle.
    My comment wasn't concerned with the syntax for null terminated strings in C++. If you take a look at the function, I'm passing a pointer and a size (the one is the length of the buffer). Whenever you pass a pointer and a length, you're specifically not expecting a null terminated string. Otherwise you wouldn't need the length.

    As I said before, I prefer using a variable for readability. Passing "" will produce a pointer to a zero, but it's not obvious to the next person that comes along why you're doing that, in a function that doesn't expect null terminated strings. It also makes the next guy wonder what the one is for. By calling the variable zeroBuff, the next maintainer can immediately understand what the intent of the code is. It's not as if I have a shortage of characters and need to save on typing.



  • @Kian said:

    Passing "" will produce a pointer to a zero

    You sure about that, kiddo?

    #include <stdio.h>
    int main() {
        return printf("%p\n", "");
    }
    
    $ g++ tmp.cpp 
    $ ./a.out 
    0x4005d4
    


  • I prefer to ignore the contents of strings for the most part. All I care about is the length in bytes of the buffer, and sticking a null character at the end. The only moments when the contents matter are when sending them to the OS for paths, which is the only source of annoyance because it means special case code if you want to port to another system, or to display to a user, and I don't deal with that code. I don't generate the strings, I don't consume them, and despite all my training, I've never been asked to reverse one. So why care about the encoding 90% of the time?



  • I didn't say a null pointer. I said a pointer to a zero.

    [code]#include <iostream>
    int main() {
    char const* pointer = "";
    int pointee = *pointer;
    std::cout << pointee << '\n';
    return 0;
    }[/code]

    output:
    [code]0[/code]

    And you should be SPANK!ed for not adding a return statement to main. NAUGHTY! NAUGHTY! It's bad form.



  • @Kian said:

    I didn't say a null pointer. I said a pointer to a zero.

    I see. But when you said "pointer to a zero", from the context it was not clear that you actually meant to say "pointer to a single null character", and thus confusion arose. (It would also have been acceptable to have said "a pointer to an empty string".)

    @Kian said:

    And you should be SPANK!ed for not adding a return statement to main.

    Ugh. I have added it. Happy now?



  • @tar said:

    I see. But when you said "pointer to a zero", from the context it was not clear that you actually meant to say "pointer to a single null character",

    Well, thanks for making my point then. This kind of confusion is why I preferred to pass the address of a zero initialized char variable helpfully named "zeroBuffer" instead of an empty string. The intent is much clearer that way.

    @tar said:

    Ugh. I have added it. Happy now?

    Well, almost. It is expected that a program that does not experience errors will return a zero on exit. printf returns the number of characters it wrote, which in your case would be 8 (or 9? I can't recall if it includes the null character). In either case, you're signalling a faulty execution when from all appearances your program is being successful. And it would actually signal successful execution if it failed to print for some reason.



  • @Kian said:

    printf returns the number of characters it wrote, which in you case would be 8 (or 9? I can't recall if it includes the null character)
    No. (Imagine what that would mean: successive printf calls would result in multiple C strings on a receiving process instead of one long one.) If you want to print a \0, I think you have to go to write/fwrite/etc.



  • @Kian said:

    Well, almost. It is expected that a program that does not experience errors will return a zero on exit. printf returns the number of characters it wrote, which in you case would be 8 (or 9? I can't recall if it includes the null character). In either case, you're signalling a faulty execution when from all appearances your program is being successful. And it would actually signal successful execution if it failed to print for some reason.

    It's interesting to note that my complying with your first request actually made this worse. When the return statement was omitted as originally written, gcc supplied an implicit return 0 and the program signalled success on completion.
    So be careful what you ask for. Anyway you can always call the main() police and let them know about my blatant disregard for the conventions of the form...

    Edit: :moving_goal_post: anyway, you asked for a return statement, I added one. You did not specify the value it should return...



  • @EvanED said:

    If you want to print a \0, I think you have to go to write/fwrite/etc.

    Well, you could do printf("\0"), right? I don't usually use printf, so I don't remember all it's quirks off the top of my head.



  • @tar said:

    Anyway you can always call the main() police and let them know about my blatant disregard for the conventions of the form...

    Oh, we know.



  • "\0" is a double-null-terminated-empty-string.



  • Well, yeah, my question is if printf scans the string to find the end. Which I suppose it must, to be able to tell how many var args there are. Nevermind then.



  • Functions that take a NUL-terminated string with no length have no way of telling whether a NUL it encounters is a terminating NUL or embedded NUL, so you know that printf("") and printf("\0") must behave the same.

    (Edited for extra umph.)



  • What you can do is putc(0).



  • @riking said:

    What you can do is putc(0).

    Or printf("%c", '\000');


  • Discourse touched me in a no-no place

    @riking said:

    []rune

    NFD? (Unicode is TRWTF. Seriously.)



  • This reminds me a lot of Raymond Chen's article on someone trying to pass a LPDWORD to a function: There's more to calling a function than just getting the types to match



  • Sarcastic comment is sarcastic.

    Guess I have to go back to using the tags.


  • Discourse touched me in a no-no place

    @Kian said:

    And you should be SPANK!ed for not adding a return statement to main. NAUGHTY! NAUGHTY! It's bad form.

    But well defined behaviour, certainly, in C++.

    @N3690 3.6.1 §5 said:

    If control reaches the end of main without encountering a return statement, the effect is that of executing
     
    return 0;



  • @xaade said:

    Sarcastic comment is sarcastic.

    Guess I have to go back to using the tags.


    Our back-end admin/dev on my other forum just implemented a [sarcasm] tag. It's backward-slanted font. It makes my eyes hurt.


  • FoxDev

    @CarrieVS said:

    It's backward-slanted font

    :wtf:


Log in to reply