Arrays in a C union and memory alignment.

topspin

This question is really newbie-like but I don't want to do anything wrong, so it's better to just ask someone. I'm working on this C codebase that has lots of structures like:

 typedef struct {

    [ snip other members... ]

  double x, y, z;

    [ snip ... ]

} Foo;

where x, y, z are semantically related. There's duplicate code that I could get rid of if they were put in an array simply by looping over it. But I don't want to refactor everything (don't want to introduce bugs), only write the new parts better instead. I've thought of putting those in a union like this:

typedef struct {
	double x, y, z;
} Bar;

typedef struct {
	union {
		Bar a;
		double b[3];
	};
} Foo;

Foo foo;

But now i have to change foo.x to foo.a.x in the existing code which is not what I want. How should I do this instead?

And even then, is it safe / correct to do so?

Obviously I've tried it and it works. But is it correct or does it somehow depend on memory alignment? The members in the array will of course not be padded so that b will be sizeof(double)*3 Bytes long. But the members of Bar could be padded depending on the architecture or whatever, so that foo.a.y and foo.a.z are not in the same memory location as foo.b[1] and foo.b[2].

The compiler isn't likely to put any padding between the doubles, but I don't think there's something prohibiting it to do it. You can do the following, which is less risky, but potentially less efficient:

[code]
#define FooArray(foo) { &(foo.x), &(foo.y), &(foo.z) }

void IUseFoo(Foo * f)
{
  double * a[ ] = FooArray(*f);
  /* iterate */
}
[/code]

Daid

For doubles it's reasonable safe, they are 64bit, and I guess the compiler aligns them on 64 or 32 bit boundaries. And unless the compiler somehow starts to align structure members to 128 bit boundaries it will work.

You could add a sanity check inside some initialization code:

if (offsetof(struct Foo, a.z) != offsetof(struct Foo, b[2]))

{

perror("Alignment failure in union 'Foo'\n");

exit(-1);

}

At least then you will get notified when it fails.

topspin

Thanks for the replies.

I hoped there would be a safe way to do this that I am not aware of. Spectre's suggestion may be worth the hassle in some cases though.

joeyadams

You may be able to do this:

typedef struct {

double x, y, z;

} Bar;

typedef struct {

union {

Bar;

double b[3];

};

} Foo;

Now you can say Foo foo; foo.x = 5;

In GCC, you'll need to enable the compiler option -fms-extensions for it to work. Intuition tells me that this should work fine in Visual C++, as -fms-extensions means it includes MS (Microsoft) extensions. However, this may not work in most other compilers.

OzPeter

@topspin said:

x, y, z are semantically related. There's duplicate code that I could get rid of if they were put in an array simply by looping over it

While your and joeyadams solutions solve the problem of overlaying the struct and the array, I don't think that you have solved the problem that you initially alluded to - that that you want to be able to iterate over x, y and z in order to simplify additional code. Yes you both come up with a nice array of 3 doubles, but the relationship between x,y,z and b[0], b[1], b[2] is purely coincidental, and not explicitly defined. What happens if the order of x,y,z is changed by a coworker but they don't see the implicit relationship to the the b array?

I think that by creating the union in order to simplify new code will actually worsen the overall code base by increasing confusion (x,y,z is used in one part, but b[0], b[1], b[2] is used in another part). To me either you justify re-factoring x,y,z globally to an array to get your iteration benefits, or you leave the code base alone and just put up with the coding inefficiencies.

joeyadams

I think that by creating the union in order to simplify new code will actually worsen the overall code base by increasing confusion (x,y,z is used in one part, but b[0], b[1], b[2] is used in another part).I think that by creating the union in order to simplify new code will actually worsen the overall code base by increasing confusion (x,y,z is used in one part, but b[0], b[1], b[2] is used in another part).

You could add a comment stating that the x,y,z entries should never be reordered. I think it's ridiculous to expect someone to come along and rearrange x, y, and z.

Using unions to make data access more convenient has been around for quite a while. Take this example from Xlib (the X11 client library):

typedef struct {
int type; /* ClientMessage /
unsigned long serial; / # of last request processed by server /
Bool send_event; / true if this came from a SendEvent request */
Display display; / Display the event was read from */
Window window;
Atom message_type;
int format;
union {
char b[20];
short s[10];
long l[5];
} data;
} XClientMessageEvent;

The union is intended to make it easier for library users to access the data in different ways without casting.

TGV

@joeyadams said:

You could add a comment stating that the x,y,z entries should never be reordered. I think it's ridiculous to expect someone to come along and rearrange x, y, and z.
Using unions to make data access more convenient has been around for quite a while.

Don't. It's an abomination. You can't be sure that they will overlap. There is no guarantee for that in the language. One day, your code will meet a compiler that implements unions slightly differently. Try a different solution if you really want to use x, y, and z, e.g. this

#define x 0
#define y 1
#define z 2
struct Foo { ...; double v[3]; ... };
...
struct Foo var;
var.v[x] = 0; var.v[y] = 0; var.v[z] = 0;

Like this you can still see whether you're working with x, y or z, and you've got your array. If you are desperate, you could also try

#define x v[0]
#define y v[1]
#define z v[2]
...
struct Foo var;
var.x = 0; var.y = 0; var.z = 0;

Note that these macro's can cause strange error messages when someone tries to use x, y, or z in another context.

Faxmachinen

As much as I'd like to use unions as a smart way to avoid typecasting or for packing bytes, it's not the right way to do it. I believe the ISO specification says that in a union, the variable X should be treated as undefined after writing to Y.