Doing it Wrong: Unions



  • While hunting down the source of some unexpected behaviour, I ran across something that made me do a double-take. First, however, the background:

    The project in question is very young (less than a year old at this point, and this was a product that will probably exist for at least the next half-decade, if not decade. My position in the group was as a temporary employee in a junior position; I was writing a test suite (necessary, but it's often booooring).

    Anyway, the WTF class looks like the following (heavily anonymised C++):

    [code]class Response {
    public:
    Response() {}
    ~Response() {}

    StatsResponse& getStatsResponse() { return _statsResponse; }
    void setStatsResponse ( StatsResponse& statsResponse )
    {
    _statsResponse.setFieldOne ( statsresponse.getFieldOne() );
    _statsResponse.setFieldTwo ( statsresponse.getFieldTwo() );
    _statsResponse.setFieldThree ( statsresponse.getFieldThree() );
    _statsResponse.setFieldFour ( statsresponse.getFieldFour() );
    }

    private:
    union {
    StatsResponse _statsResponse;
    };
    };[/code]

    If you, like me, have no idea what "union" implies, I'll save you the googling: a union groups a bunch of variables in the same way that you would expect a struct { } to, except that they all start at the same memory address. This means that when you write values to one variable, you write them to all others. Apparently this was used (and is still used) in things like YACC as a sort of poor-man's object-oriented capacity, since you can return any one of several different types.

     

    Note: the four fields shown are all the fields in StatsResponse. There are no missing entries to that union. This is not example code, or something that is going to be changed: this has been pointed out here as a WTF but ignored (apparently this was coded by someone of importance, and not to be changed).

    To summarise the WTFs:

    • _statsResponse can be accessed exactly like a private member field.
    • _statsResponse is union'd with nothing else, so it's EXACTLY like a private member field.
    • union isn't even useful here, since we're coding in C++, so we have proper OO (or at least, as proper as C++ gets)
    • using union in the first place
    • StatsRespone cannot have a constructor or destructor, as these are not supported within unions. Since it just happens to hold ints in this case (at this time), that's not a problem...yet. If the class ever has to hold some other object, it'll have to be careful about it.

     

    Also, TRWTF: I can't find any easy way of posting properly-indented code (at least, nothing looks correct in the Preview tab). I can only hope it at least got line-breaks right; apologies if it doesn't.



  • Unions made a lot more sense in plain C, where there was no OO.  In that case, you would have something like:

    struct {
        int type
        union {
            int i;
            char * s;
           // etc
            }
        }

    There 'type' would tell you what was stored.  With base and derived classes you can treat types abstractly if neccessary.

    It looks like an historical leftover (even though the project is young) mixed with some cargo-cult programming.  At one time there may have been multiple members in the union, but remvoed at a later date.  Whoever combined them into one left the union statement.  Either through negligence or worrying about touching something that worked.



  • I don't doubt that they made more sense with C, but this is C++ - the entire project. Unless this is copypasta, it's all new, hand-written code. Since you suggested it might have been modified at some point, I checked - the code I posted is the only revision, so stuff could only have been removed if this was a copy-paste. It's not like this hasn't been pointed out: everyone who's seen is has said "wtf" in response, but it remains uncorrected (perhaps the larger WTF than it existing in the first place).

    I'd remove it myself if I were both still there, and I could justify changing something that the more senior members of the team didn't (I suspect I don't know the full story about why it's still there).



  • I didn't think about copy and paste, but that makes more sense.  At least I hope it's C&P.  I would worry about someone who put it in there intentionally.

    I sometimes think we give too much deference to seniority.  It's one thing if you are not sure about the problem.  It's quite another when it's clearly wrong. However, you now have the best excuse in the world!



  • @DogmaBites said:

    Unions made a lot more sense in plain C, where there was no OO.

    C wasn't a strongly object-oriented language, but it definitely had OO abilities. 



  • @DemonWasp said:

    Note: the four fields shown are all the fields in StatsResponse.

    Then TRWTF is that they didn't just assign to it.

    @DemonWasp said:

    Also, TRWTF: I can't find any easy way of posting properly-indented code (at
    least, nothing looks correct in the Preview tab).

    I find that when using the raw editor, the least troubling way to insert code is to wrap it in <pre>[]code]...[]/code]</pre>. You still need to C-escape \ as \, though.

    [code]
    class Response {
    public:
        Response() {}
        ~Response() {}
    
        StatsResponse& getStatsResponse() { return _statsResponse; }
        void setStatsResponse ( StatsResponse& statsResponse )
        {
            _statsResponse.setFieldOne ( statsresponse.getFieldOne() );
            _statsResponse.setFieldTwo ( statsresponse.getFieldTwo() );
            _statsResponse.setFieldThree ( statsresponse.getFieldThree() );
            _statsResponse.setFieldFour ( statsresponse.getFieldFour() );
        }
    
    private:
        union {
            StatsResponse _statsResponse;
        };
    };
    [/code]
    


  • @morbiuswilters said:

    @DogmaBites said:

    Unions made a lot more sense in plain C, where there was no OO.

    C wasn't a strongly object-oriented language, but it definitely had OO abilities. 

    I agree. MFC and GTK+ might be some examples of OO in C. And structs could be considered "method-less" objects as well.


  • @morbiuswilters said:

    @DogmaBites said:

    Unions made a lot more sense in plain C, where there was no OO.

    C wasn't a strongly object-oriented language, but it definitely had OO abilities. 

    OK, I was speaking loosely.  Still, the only thing it has for OO is defining your own structures and treating them as different types.  You couldn't pass structure A to a function that expected structure B as a parameter.  Everything else had to be done by hand.  If you wanted virtual functions,  you created your own function table.


  • @danixdefcon5 said:

    I agree. MFC and GTK+ might be some examples of OO in C. And structs could be considered "method-less" objects as well.

    I'm not familiar with GTK+, but MFC is a C++ library (assuming we are talking about Microsoft Foundation Classes)


  • @DogmaBites said:

    You couldn't pass structure A to a function that expected structure B as a parameter

    Of course you could, you would just cast it ;-)



  • I read your entire post and did not find anything about fair wages OR vacation time.

    Sir, you fail!



  • @DogmaBites said:

    You couldn't pass structure A to a function that expected structure B as a parameter.

    Sure you can:

    [code]
    struct Mommy
    {
    int a;
    int b;
    };

    struct Baby
    {
    struct Mommy;
    int c;
    };[/code]

    Now the compiler will accept Baby wherever Mommy is expected. Doing vtables and stuff requires a bit more magic, but it's still fairly simple.

    (Note that the C standard guarantees casting back and forth between a struct and its first element only, meaning that implementing multiple inheritance this way wouldn't work.)



  • @DogmaBites said:

    @danixdefcon5 said:

    I agree. MFC and GTK+ might be some examples of OO in C. And structs could be considered "method-less" objects as well.

    I'm not familiar with GTK+, but MFC is a C++ library (assuming we are talking about Microsoft Foundation Classes)
    GTK+ runs solely in C, which gives the advantage of being able to build your GUI app in pure C. I prefer Qt, as that one was built on C++.

    I had little experience with MFC, but it seemed to be a lot like the GTK+ stuff, so I assumed it kind of worked like the GTK+ "OO" system.



  • @bstorer: Would that solution let you pass, for example, Baby[ ] to something expecting Mommy[ ], or would it cause problems with sizeof ( Baby ) > sizeof ( Mommy ) ?

    But yes, that's pretty cool that you can do nonsense like that. Still, I'd prefer to write in a nicer OO environment.

     

    (Edited to put the closing brackets back in, albeit with a space between them -- Jeff)



  •  I'll have to check the C standard some time.  I didn't know that this specific cast was implicitly done.  However, I just tried it on both GCC 4.1.3 and MSVC 6.0 and both complain about incompatible types, along with "declaration does not declare anything" for the inner Mommy.  I tried putting a specific member name for the inner Mommy, but that did not help with the incompatible type warning.

    It would definitely be a cool thing If the casting was implicit  (that is the way I read your post).  But my admittedly cursory check says no :(.



  • @DemonWasp said:

    @bstorer: Would that solution let you pass, for example, Baby[ to something expecting Mommy

    Yes.
    @DemonWasp said:
    or would it cause problems with sizeof ( Baby ) > sizeof ( Mommy ) ?
    No, because for all intents and purposes, the compiler would treat Baby as a Mommy.



  • @bstorer said:

    No, because for all intents and purposes, the compiler would treat Baby as a Mommy.

    Just like the Liberal education system that hands out condoms to first-graders. 



  • Ah poop, the forum deleted the square brackets. I meant to ask whether Baby-array versus Mommy-array would make any difference, but apparently square brackets get eaten by the software. My line of thinking here is that since arrays of non-pointers are stored sequentially, a function expecting Baby (size 8, say) and getting Mommy (size 12) would index incorrectly into the array - after all, it's pointer arithmetic on the CPU.

    @morbius: Handing out condoms to highschoolers is a better solution than preaching abstinence-only sex-ed. "Neither" is probably a better bet, and that's what I got in high school.



  • @DemonWasp said:

    @morbius: Handing out condoms to highschoolers is a better solution than preaching abstinence-only sex-ed. "Neither" is probably a better bet, and that's what I got in high school.
     

    Morbius is joking/trolling .... any further discussion on this will get deleted ... feel free to discuss offline.  

    Thanks!


  • Considered Harmful

    I was very disappointed when I discovered that I could not union a BYTE with 8 bools so I could access each bit individually. Each bool takes a byte so I could really only set the highest bit of the BYTE. I ended up writing a class to do it. Of course, C/C++ don't actually fully define how the different types are stored in memory, so in some architectures there may have been more than 8 bits to a byte. That's the real WTF in my opinion.

    I didn't care about portability because the platform it was running on was well-defined.



  • @joe.edwards said:

    I was very disappointed when I discovered that I could not union a BYTE with 8 bools so I could access each bit individually. Each bool takes a byte so I could really only set the highest bit of the BYTE.

    lern2bitfield 



  • @DemonWasp said:

    I meant to ask whether Baby-array versus Mommy-array would make any difference, but apparently square brackets get eaten by the software. My line of thinking here is that since arrays of non-pointers are stored sequentially, a function expecting Baby (size 8, say) and getting Mommy (size 12) would index incorrectly into the array - after all, it's pointer arithmetic on the CPU.
    You are correct; that won't work. But if you had arrays of pointers to Baby and Mommy objects you can. Consider:
    [code]
    #include <stdio.h>

    struct Mommy
    {
    int a;
    int b;
    };

    struct Baby
    {
    struct Mommy mommy;
    int c;
    };

    void foo(struct Mommy * [ ]);

    int main ()
    {
    struct Baby * baby[3];
    baby[1] = malloc(sizeof(struct Baby));
    baby[1]->mommy.a = 1;
    baby[1]->mommy.b = 2;
    baby[1]->c = 3;
    foo((struct Mommy **)baby);
    return 0;
    }

    void foo (struct Mommy * mom[ ])
    {
    printf("%d", mom[1]->b);
    return;
    }[/code]

    Excuse the spaces in the square brackets. As you say, BBCode eats them otherwise.



  • @joe.edwards said:

    I was very disappointed when I discovered that I could not union a BYTE with 8 bools so I could access each bit individually. Each bool takes a byte so I could really only set the highest bit of the BYTE. I ended up writing a class to do it. Of course, C/C++ don't actually fully define how the different types are stored in memory, so in some architectures there may have been more than 8 bits to a byte. That's the real WTF in my opinion.

    I didn't care about portability because the platform it was running on was well-defined.

     

    In addition to morb's post, you should try to understand the purpose of language specs.  By not specifying memory layout or word size, C and C++ can be compiled for a large number of platforms.  An important part of writing specifications is to understand what you should not specify.

    Your problem with bool and bytes is that you forget the need that each value be independently addressable and writable.  If bools were truly implemented as individual bits, a bool* would have to larger than any other data pointer to include the bit position.  Also, writing one bit would necessarily involve reading and writing the other 7 bits.  This would make writing multithreaded code impossible.

    I'll simply leave the RWTF unmentioned. 



  •  C still exists, no need for past tense. But it never had any support for OOP what so ever, at least none that machine code doesn't have. What are you thinking of?



  • @DogmaBites said:

    If bools were truly implemented as individual bits, a bool* would have to larger than any other data pointer to include the bit position.  Also, writing one bit would necessarily involve reading and writing the other 7 bits.

    This is why bit fields are pretty much unused now.  Accessing a single bit is way more of a PITA than accessing a single byte, even if it does save space. 



  • @bstorer said:

    Excuse the spaces in the square brackets. As you say, BBCode eats them
    otherwise.

    Use []]] to say []].

    @joe.edwards said:

    I was very disappointed when I discovered that I could not union a BYTE with 8 bools so I could access each bit individually.

    [code]
    union
    {
      unsigned char byte;
      struct {
        unsigned bit0 : 1;
        unsigned bit1 : 1;
        unsigned bit2 : 1;
        unsigned bit3 : 1;
        unsigned bit4 : 1;
        unsigned bit5 : 1;
        unsigned bit6 : 1;
        unsigned bit7 : 1;
      } bits;
    } uber_byte;
    [/code]

    Clumsy hack, I know, but hey, it works.



  • @ahnfelt said:

    C still exists, no need for past tense.

    I was just following the style established by the post I was replying to.

     

    @ahnfelt said:

    But it never had any support for OOP what so ever, at least none that machine code doesn't have. What are you thinking of?

    Yes it did.  Read the rest of the thread before you reply.



  • @Spectre said:

    Clumsy hack, I know, but hey, it works.

    That's exactly what I said above, if you would read before posting.  Besides, it's slow and obnoxious -- just use chars for boolean values. 



  • @Spectre said:

    Use []] to say [] ... Clumsy hack, I know, but hey, it works.
    Tell me about it!



  • @morbiuswilters said:

    That's exactly what I said above, if you would read before posting. 

    Ah, but there were no code samples! And what is the use of a post, thought I, without lolcats or code samples?



  • Putting something inside union is the idiom to show that you never expect that something to grow constructors or destructors. I can see two wtfs:

    1) You didn't know the idiom.

    2) They didn't explain the idiom well enough when you asked about it.



  • @Fister said:

    Putting something inside union is the idiom to show that you never expect that something to grow constructors or destructors. I can see two wtfs:

    1) You didn't know the idiom.

    2) They didn't explain the idiom well enough when you asked about it.

     

     Well, with 0x, you will be able to put non-POD types into a union so that would break the idiom.  IMO it's a pretty stupid idiom to begin with.



  •  @DogmaBites said:


    I'll simply leave the RWTF unmentioned. 

     

     

    I think using a pointer to a type smaller than the actual pointer is TRWTF (unless you're doing some weird C-voodoo, which is in turn usually a WTF).



  •  

    @Fister said:

    Putting something inside union is the idiom to show that you never expect that something to grow constructors or destructors. I can see two wtfs:

    1) You didn't know the idiom.

    2) They didn't explain the idiom well enough when you asked about it.

    I've never seen that idiom before, and to check I asked Google about it and looks like Google didn't hear about it either.

    I could see the need if the programmer was somehow creating their own large scale union type of behavior.  But then you wouldn't create a larger class with a single member union.  The construction in the OP really makes no sense.  In the OP, there is just the single data member of the class. 

    If say I really wanted to store arrays of two different types in the same memory without directly declaring a union, something like this would be better:

    struct Unionable1 {
        union {
            type1 t; };};

     struct Unionable2 {
        union {
            type2 t; };};

    Now it's known they could be unioned.  I'm wondering if you understood the meaning and use of this idiom.  I'm also curious to see if this is really a common idiom.  I'm still of the opinion that it would be better to just properly manage the lifetime of the objects rather than do weird crap like this.  I think constructors and destructors are too beneficial to give them up.  RAII and deterministic destructors make code much more clear and less error prone.



  • @Voidpointer said:

     @DogmaBites said:

    I'll simply leave the RWTF unmentioned. 

     

    I think using a pointer to a type smaller than the actual pointer is TRWTF (unless you're doing some weird C-voodoo, which is in turn usually a WTF).
     

    You've never had an array of items?  Or an optional instance?  Or passed in a pointer so the function can modify the parameter?  Where the hell have you been?

    For the difficult of understanding, I was trying to be diplomatic. 

     



  • @bstorer said:

    Stuff about Baby Momma and derived structs

    I couldn't find anything in the C99 standard that says anything about automatic casting.  It might be guaranteed that the layout matches (I didn't see that specifically, but I'm reasonably sure it's the case).  Still, to make use of the layout, you would still have to cast the derived struct as the base struct.  At that point, just reference the member.  So again, you have to do most everything manually.  I think a better way to state it is C allows the programmer to do OO on his own.

     http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf



  •  If, indeed, the intent was to force using only POD types with the object wouldn't it make more sense to use boost::has_trivial_destructor and boost::has_trivial_constructor with a static assert or something?  Either way, the union encapsulating a single member seems sort of sloppy.  I'd be moreinclined to believe it's just something from copy/paste coding.

     



  • @DogmaBites said:

    I couldn't find anything in the C99 standard that says anything about automatic casting.  It might be guaranteed that the layout matches (I didn't see that specifically, but I'm reasonably sure it's the case). 
    And you won't find anything. If you read my posts as implying that the compiler casts automatically, I apologize for that. No, you'll always have to cast. All C guarantees is that you can safely cast between a struct and the first member of that struct. But that's all we really need.
    @DogmaBites said:
    Still, to make use of the layout, you would still have to cast the derived struct as the base struct.  At that point, just reference the member.  So again, you have to do most everything manually.  I think a better way to state it is C allows the programmer to do OO on his own.
    No, I just supplied a really simple example. Things get more interesting when you start using a vtable, but it takes up more space than I felt like wasting here. If you're interested in a slick OO solution in C, check out OOPC.



  • @morbiuswilters said:


    @ahnfelt said:

    But it never had any support for OOP what so ever, at least none that machine code doesn't have. What are you thinking of?

    Yes it did.  Read the rest of the thread before you reply.

     

    Uh, I read the thread before replying. And I just reread it. And no indication of any support for OOP magically appeared. Should I read it again?

    For example:

    Mother* isn't a Baby* (and Mother certainly isn't a Baby). The only way to get around it is by bypassing the type system with an unchecked cast.

    With casts, you can obtain polymorphism by creating a custom vtable, but you still have to pass "this" explicitly. And the syntax for private methods and methods that shouldn't be in the vtable will be different.

    Maybe we just disagree on what constitutes capabilities for a concept. Sure it's possible to program object oriented in C, as it is in any turing complete language. But to say it supports object oriented programming has no meaning if you define it to be true for all such languages.


  • I feel like picking some nits. Who's with me?
    @ahnfelt said:

    Mother certainly isn't a Baby
    Mother is never a Baby. If anything, Baby is a Mother (kids these days!).
    @ahnfelt said:
    The only way to get around it is by bypassing the type system with an unchecked cast.
    But that's the only way to accomplish anything in C!
    @ahnfelt said:
    With casts, you can obtain polymorphism by creating a custom vtable, but you still have to pass "this" explicitly.
    And you have to make self a parameter to all object methods in Python. That's irrelevant.
    @ahnfelt said:
    Sure it's possible to program object oriented in C, as it is in any turing complete language. But to say it supports object oriented programming has no meaning if you define it to be true for all such languages.
    I entirely agree with you here: C is not an object-oriented programming language. Saying that it "supports" OOP is to make the word meaningless. But, by the same token, it's unfair to claim it can't do OOP at all. The example I linked to before shows a pretty solid approximation of C++'s basic OOP design. Still, if you want C and OOP, I say use Objective C.



  • @bstorer said:

    @ahnfelt said:
    The only way to get around it is by bypassing the type system with an unchecked cast.
    But that's the only way to accomplish anything in C!

    C has a type system???

     

    @bstorer said:

    I entirely agree with you here: C is not an object-oriented programming language. Saying that it "supports" OOP is to make the word meaningless. But, by the same token, it's unfair to claim it can't do OOP at all. The example I linked to before shows a pretty solid approximation of C++'s basic OOP design. Still, if you want C and OOP, I say use Objective C.

    I win again!



  • @morbiuswilters said:

    C has a type system???
    It has pointers and a memory model that doesn't ask questions. Does that count?



  •  @bstorer said:

    @ahnfelt said:
    Mother certainly isn't a Baby
    Mother is never a Baby. If anything, Baby is a Mother (kids these days!).
     

    Whoops, yeah, that's what I meant to say.

    @bstorer said:

    I entirely agree with you here: C is not an object-oriented programming language. Saying that it "supports" OOP is to make the word meaningless. But, by the same token, it's unfair to claim it can't do OOP at all.

    Agreed. 

    @bstorer said:

    Still, if you want C and OOP, I say use Objective C.

    Except then you're no longer using C. Maybe that's your point: if you want to use C and OOP, don't ;-)



  • @ahnfelt said:

    Except then you're no longer using C. Maybe that's your point: if you want to use C and OOP, don't ;-)

    Pretty much. You can do it in C if you have to, but you might as well go to a language better capable of it. Work smart, not hard.



  • @Voidpointer said:

    I think using a pointer to a type smaller than the actual pointer is TRWTF (unless you're doing some weird C-voodoo, which is in turn usually a WTF).

    So then all my use of [code]void *[/code] back in my C days was a WTF (given that void is the smallest type ever, taking no space. (But, of course, void pointers could and usually did point to memory blocks large enough to hold infinitely many voids, and generally a few chars, ints, floats, structs, or similar data types. So void pointers were more useful than voids - especially since they could be cast to anything without translation.))

    @DogmaBites said:

    FWIW, most of the union code I've seen was much more about doing type translations *without* a cast, rather than trying to do poor-mans OO. For example, if you had a long long, and you wanted to save it raw to a file, you could:

    • implement a loop, where you grab 8 bits at one end, and then shift the number 8 bits, and repeat until you're out of bits (note: need to deal with negative numbers, as they can shift funny.) Note: this requires you deliberately choose your file's endianness.
    • Implement a function with one prototype, but use a slightly different prototype in the file every other module in your program uses (of course, the file with said function cannot use that header file, for obvious reasons.)
    • use a [code]union { long long i; char c[8]; }[/code]. Note: this uses whatever form of endian the system you're on uses, so the data's not as portable. It is, on the other hand, much faster than the other.

    That all having been said, any union with only one member is a WTF. It's also RFC 1925.2.6a compliant.


Log in to reply