X * f(x) semantics

Captain

Is value of x*f(x) unspecified if f modifies x?

I've looked at a bunch of questions regarding sequence points, and haven't been able to figure out if the order of evaluation for x*f(x) is guaranteed if f modifies x, and is this different for f(x...

You nutjobs actually use languages like these?

rad131304

No, this guy is just stupid. if it's simply f(x), by definition, x is independent.

LB_

Yes, we nutjobs use languages where the compiler is free to choose the most efficient way to evaluate an expression for us.

Captain

Hint: the value of that expression is not defined.

rad131304

In what way, and for what non null space?

Captain

There are two distinct orders in which to evaluate x * f(x), and if f(x) modifies x, then the two orders do not produce the same result. And the language spec does not specify an order.

rad131304

@Captain said:

There are two distinct orders in which to evaluate x * f(x), and if f(x) modifies x, then the two orders do not produce the same result. And the language spec does not specify an order.

#Original post:
Well, no. Given mathematical order of operations - even if f(x) could modify x, then the modification is contained to f(x). However, since f(x) produces a delta(x), this implies g(f(x)) such that delta(x) is 0 iff x is independently - in the frequentest statistical sense - measurable.

It's kind of how functions work, and it's a big deal in math. Anything that cannot satisfy 1:1 for the independent variables is, by definition, a relation. Relations are not functions.

#Edit:

So, yes, there are many mathematical systems where multiplication is not commutable; however, the order of operations must always be specified. If x * f(x) is ambiguous, then the operation is always undefined, no matter the operands.

bb36e

I believe cpn meant ‘undefined’ in the computing sense, and not the mathematical one.

rad131304

@bb36e said:

I believe cpn meant ‘undefined’ in the computing sense, and not the mathematical one.

I'm sure he did, but then x * f(x) being undefined is a consequence of violating the idempotent property. IOW - spaghetti code is spaghetti.

Or worse, your processor can't assume either multiplication or division ... meaning the math of the processor is not a closed space.

ben_lubar

@Captain said:

You nutjobs actually use languages like these?

Go doesn't have pass-by-reference-without-anyone-telling-you, so you'd at least have to do x*f(&x), and even then you'd have a well-defined order that actually makes sense.

LB_

IMO the compiler should reject the code and force you to write it in a less ambiguous manner.

Kian

I think there not being a specified order is a good thing, exactly for the reason that this fails. If you have a specified order, shit like this becomes defined, and shitty programmers will feel justified in using it. By keeping the evaluation order undefined, you are telling programmers "when you call a function, your arguments can't modify each other when being evaluated". This is thus a bug until it gets fixed, and overall reduces surprises.

As LB said, it would be nice if the compiler would reject this. I would imagine the reason it doesn't (assuming this doesn't raise a warning when compiled with warnings on) is because C allowed it, or because it was difficult or impossible to detect reliably. After all, a function might not modify a non const parameter, so outright forbidding this example would be wrong. The compiler would have to see into the function and determine it's problematic, which is not always possible (if it was actually a function pointer set at runtime, for example).

LB_

Part of undefined behavior means the compiler could reject the code if it wanted to, but I don't know of any implementation that does. A warning is probably as good as it gets.

flabdablet

@Kian said:

I would imagine the reason it doesn't (assuming this doesn't raise a warning when compiled with warnings on) is because C allowed it,

C++, maybe. Sane languages like C don't disguise indirection and always pass function parameters by value, so f(x) can never modify the x that's visible to its caller's scope.

The canonical C case of this kind of thing is x * f(x++), which is indeed undefined.

Maciejasjmj

No obtuse, multi-paragraph rant on how Haskell does it better? I'm thoroughly disappointed.

dkf

@Kian said:

I think there not being a specified order is a good thing, exactly for the reason that this fails.

That's quite a mental contortion! Here we have an apparently simple expression, and C++ is in such a state as a language that it is not practical to tell at a glance whether the expression is legal or something that violates all sorts of good practice. You need to dig into both the signature of f and its implementation to figure that out. And possibly also the implementation of * for whatever this type actually is.

You might think this is great, but I disagree.

martijntje

If it were simple anyone could do it and they wouldn't pay me such a good salary for it.

Planar

@flabdablet said:

C++, maybe. Sane languages like C don't disguise indirection and always pass function parameters by value, so f(x) can never modify the x that's visible to its caller's scope.

QFT

As for Haskell, it does it just as well as C.

gwowen

@flabdablet said:

C++, maybe. Sane languages like C don't disguise indirection and always pass function parameters by value, so f(x) can never modify the x that's visible to its caller's scope.

Uh huh. So what does the following code do?

[code]
int f(int a[2])
{
int tmp=a[0];
a[0] = a[1];
return tmp;
}

int main()
{
int x[2] = {3,4};
/*"C" arguments are passed by value, except when they're not */
printf("%d\n", x[0] * f(x));
}
[/code]

Steve_The_Cynic

@gwowen said:

@flabdablet said:
C++, maybe. Sane languages like C don't disguise indirection and always pass function parameters by value, so f(x) can never modify the x that's visible to its caller's scope.

Uh huh. So what does the following code do?

[code]
int f(int a[2])
{
int tmp=a[0];
a[0] = a[1];
return tmp;
}

int main()
{
int x[2] = {3,4};
/*"C" arguments are passed by value, except when they're not */
printf("%d\n", x[0] * f(x));
}
[/code]

One of two things, at the compiler's whim, but I have a vague and possibly erroneous recollection that code that depends on whether, in a * b, a is evaluated before or after b will be guilty of UB. It is certainly unspecified.

There are sequence points that are effectively immediately before and immediately after the call to f() (between argument evaluation and call, and between the end of the return statement and the use of the returned value), so there is no issue about reading the value of a variable to do something that is not calculating a modified value of the variable being modified.

You still don't know whether the compiled code will retrieve the old value of a[0] before or after evaluating the function call, but there are enough sequence points that the result will either be 3*4 or 4*4, but you don't know which.

And I'd say that any code that creates so much discussion, violent agreement, and so on should probably be nuked from orbit.

PJH

@gwowen said:

/*"C" arguments are passed by value, except when they're not */

Nope. They're still being passed by value, just in this case the 'value' x being passed to f() is the address of the array.

@Steve_The_Cynic said:

I have a vague and possibly erroneous recollection that code that depends on whether, in a * b, a is evaluated before or after b will be guilty of UB.

In the code

printf("%d\n", x[0] * f(x));

the f(x) is its own sequence (because it's a function call), thus it is well defined. What is unspecified is whether it gets called before or after x[0] is examined. prior to the multiplication happening.

Steve_The_Cynic

@PJH said:

@Steve_The_Cynic said:
I have a vague and possibly erroneous recollection that code that depends on whether, in a * b, a is evaluated before or after b will be guilty of UB.

In the code
printf("%d\n", x[0] * f(x));
the f(x) is its own sequence (because it's a function call), thus it is well defined. What is unspecified is whether it gets called before or after x[0] is examined. prior to the multiplication happening.

Which part of what I wrote suggested that I thought the code itself was UB? In fact, I even explained in more detail than you why the multiplication itself is defined but unspecified. No, I said that I vaguely and possibly incorrectly remembered that depending on the order being a particular way is a short road to UB.

Steve_The_Cynic

And I stand by my assertion that this class of code should be nuked from orbit.

Steve_The_Cynic

@Steve_The_Cynic said:

No, I said that I vaguely and possibly incorrectly remembered that depending on the order being a particular way is a short road to UB.

The pink thing can fucking fuck the fuck off and fucking well fucking die. And the stupid sodding wanker who wrote it can follow it there.

I'm getting more of this memory now, and the thing I remembered was in the context of expressions like f()*g(). If either of those functions has side effects, you're in the doodoo. If both of them have side effects, the doodoo is over your head. If the side effects interact in any way, the Moon is plowing through the upper layers of the doodoo it's so deep.

gwowen

@PJH said:

Nope. They're still being passed by value, just in this case the 'value' x being passed to f() is the address of the array.

Well, there's a distinction without a difference. Instead of passing a copy of the array object x[2], you (silently) pass the address of x (i.e. a [i]reference to[/i] x).

When the prototype looks f(T x) - "The object x is passed by reference" and "the address of the object x is passed by value" mean the same thing.

It makes no sense that f(T) and f(&T) are the same for one type of object (arrays), and different for every other type of object. Its a bizarre historical quirk that means that arrays are passed in a way that is fundamentally different to all other objects. You can call it "arrays decay to pointer" if you like, but it semantically identical to pass-by-reference.

flabdablet

@gwowen said:

You can call it "arrays decay to pointer" if you like, but it semantically identical to pass-by-reference

except that it isn't, because the type of the receiving parameter per spec, is pointer to T, not array of T, even if it's declared as array of T in the parameter list:

stephen@debian-usb:/tmp$ cat demo.c
#include <stdio.h>

void test(int parm[2]) {
    printf("Value of parm: %p\nAddress of parm: %p\n", parm, &parm);
    parm += 1; // would modify arg if truly pass-by-ref
    printf("Value of parm: %p\nAddress of parm: %p\n", parm, &parm);
}

int main(int argc, char *argv[]) {
    int arg[2] = {3, 4};
    printf("Value of arg: %p\nAddress of arg: %p\n", arg, &arg);
    test(arg);
    printf("Value of arg: %p\nAddress of arg: %p\n", arg, &arg);
    return 0;
}
stephen@debian-usb:/tmp$ gcc demo.c -o demo
stephen@debian-usb:/tmp$ ./demo
Value of arg: 0x7fff9e6ebfe0
Address of arg: 0x7fff9e6ebfe0
Value of parm: 0x7fff9e6ebfe0
Address of parm: 0x7fff9e6ebfb8
Value of parm: 0x7fff9e6ebfe4
Address of parm: 0x7fff9e6ebfb8
Value of arg: 0x7fff9e6ebfe0
Address of arg: 0x7fff9e6ebfe0
stephen@debian-usb:/tmp$

It may well be a bizarre historical quirk that the value of a C array is the address of its first element, but that treatment shows up consistently everywhere such a value is used, not just in function parameter lists.

gwowen

@flabdablet said:

It may well be a bizarre historical quirk that the value of a C array is the address of its first element, but that treatment shows up consistently everywhere such a value is used, not just in function parameter lists.

Well, sort of consistently.

[code]
int foo(int parm[16])
{
int parm2[16];
assert(sizeof(parm) == 16 * sizeof(int)); // fails
assert(sizeof(parm2) == 16 * sizeof(int)); // succeeds!
}
[/code]
[code]
parm += 1; // would modify arg if truly pass-by-ref
[/code]
No, it should be a compiler error. Just like
[code]
{
int parm[2] = {3,4};
parm += 1;
}
[/code]
is a compiler error. This doesn't prove that you're passing by value, it proves that C's object model is broken with-respect-to arrays. So invisible-decay-to-pointer isn't quite the same as pass by reference, but its a hell of a lot closer than it is to pass-by-value.

By any reasonable definition, if you pass an object by value, your local copy cannot change. That's the defining characteristic of pass-by-value.
[code]
x[2] = {1,2}; // x is NOT a pointer, its an array object.
foo(x); // The function call is on an array object, not a pointer
[/code]
C treats function calls on array objects with silent conversions to pointers - and its idiotic. If it's possible that x is not an array containing {1,2} here then X HAS NOT BEEN PASSED BY VALUE.

flabdablet

@gwowen said:

Well, sort of consistently.

int foo(int parm[16])
{
int parm2[16];
assert(sizeof(parm) == 16 * sizeof(int)); // fails
assert(sizeof(parm2) == 16 * sizeof(int)); // succeeds!
}

That's quite consistent with what I pointed out above (array declarations in function parameter lists actually declare pointers). Inside foo(), parm is pointer to int, parm2 is array of 16 ints.

flabdablet

@gwowen said:

C treats function calls on array objects with silent conversions to pointers

No, C defines the value of an array variable as the address of the array's first element. That value does not have array type; it has pointer type.

Array variables cannot be lvalues. Expressions containing array variables can.

The only special handling for arrays in the context of functions is that declaring one as a function parameter actually declares a pointer.

gwowen

@flabdablet said:

That's quite consistent with what I pointed out above (array declarations in function parameter lists actually declare pointers). Inside foo(), parm is pointer to int, parm2 is array of 16 ints.

Yes, I know HOW POINTER DECAY WORKS. I've known that for 20+ years.

But you said

that treatment shows up consistently everywhere such a value is used, [b]not just in function parameter lists[/b].

which is bollocks.

[code]
typedef int Type1[32];

void crash(Type1 t1){
assert(sizeof(Type1) == sizeof(t1)); // nope
}

typedef struct { Type1 t1 } Type2;
void fine(Type2 t2){
assert(sizeof(Type2) == sizeof(t2)); // perfectly fine
}
[/code]

flabdablet

@gwowen said:

If it's possible that x is not an array containing {1,2} here

It's not possible that x is an array containing {1, 2} anywhere, because x is just a constant value: the address of such an array.

You might not like that, and you might consider it idiotic, but it's consistent and it has nothing to do with the function call mechanism in particular.

flabdablet

@gwowen said:

Yes, I know HOW POINTER DECAY WORKS. I've known that for 20+ years.

Clearly you don't understand what I learned 30 years ago, which is that pointer decay applies to declarations in function parameter lists as well as to array values. When you declare a function parameter using array syntax, what you actually get is identical to what you'd get if you'd declared that parameter as a pointer to the array's base type.

Function parameters cannot be actual arrays, because actual arrays have constant values defined on creation (address of first element) and can't be lvalues, even the implied lvalue of a passed-by-value argument being copied to a parameter.

You might not like this particular piece of syntactic sugar, but it is what it is.

Filed under: belt onions at ten paces

gwowen

@flabdablet said:

It's not possible that x is an array containing {1, 2} anywhere, because x is just a constant value: the address of such an array.

No. It's not. It decays to that in some contexts (specifically parameter lists) but this is [b]not[/b] one of them. In this context, x [b]is an array object[/b], that's why you can do
[code]
int x[72];
int nelems = sizeof(x)/sizeof(x[0]); // nelems == 72
[/code]

flabdablet

sizeof is a compile-time operator, and has nothing to do with the runtime values of the variables or types whose size it yields.

FWIW, my personal preference for that idiom has always been nelems = sizeof x / sizeof *x;

Kian

@flabdablet said:

The canonical C case of this kind of thing is x * f(x++), which is indeed undefined.

That explains why it's easy to miss it. But C would still allow x * f(&x) which does the same thing. And it has the same issue, it's possibly undefined behavior the compiler will not reject. And it's only undefined because the order of evaluation is not specified. So everything I said applies to C as well.

@dkf said:

You might think this is great, but I disagree.

The syntax is awful, I agree. But the decision to keep the order unspecified is good. It carries some costs, but the consequences are less bad than the consequences of specifying the order and letting parameters to a function alter each other based on their position, which would make code even more intractable.

Or do you think having the result of (f(x) == g(x)) be different to (g(x) == f(x)) would be a good thing? The syntax could be better, but highlighting that the parameter may be modified (as C does) still doesn't fix it. It makes the bug easier to catch once you've seen it happen, which is good, but (f(&x) == g(&x)) could still be a perfectly legal operation, you'd still need to look at the function to tell.

@Steve_The_Cynic said:

And I stand by my assertion that this class of code should be nuked from orbit.

Of course it should. The problem lies in detecting it. C makes it easier, but the fact remains that (f(&x) == g(&x)) may or may not be unspecified depending on what f and g do. And it's not always possible for the compiler to know what they do at compile time, so the best it can do is sometimes offer a warning.

flabdablet

@Kian said:

C would still allow x * f(&x) which does the same thing

But the presence of the & inside the argument list is a visual flag that something more than a simple function invocation on x is going on, as does the [1] in x[1] * f(x).

C doesn't stop you doing stupid things - far from it - but they're more often visually distinguishable from non-stupid things than they are in C++.

dkf

@Kian said:

It carries some costs, but the consequences are less bad than the consequences of specifying the order and letting parameters to a function alter each other based on their position, which would make code even more intractable.

Well, right now the compiler probably just makes an arbitrary decision and you have no idea whether that's right or wrong. Perhaps the compiler ought to detect these sorts of things and issue a warning, but I guess that might be very difficult to do in practice.

Languages other than C and C++ either have side-effect free expressions or define the evaluation order (so that the compiler has to prove the safety of the reordering, rather than the other way round) which seems to be a bit better as it means that they define the semantics of all syntactically valid statements. It's also definitely better when it comes to the really evil subtleties of floating point.

gleemonk

C is funny because it provokes the nerd in us to argue endlessly about something which C specifies as forbidden knowledge. I was trying to explain this concept of undefined behaviour to a student friend once, writing i=1; i = i++ * i++; or somesuch to explain that we cannot know what the compiler would do. While he's looking at it seemingly lost in thought about this deep concept, another bloke walks past and mentions that i == 3 obviously. My friend agrees about the value 3 but disagrees on how the end result would be achieved. turns out he was not contemplating the ramifications of UB, he was trying to guess what it does!

It's not i = 2 * 1; i++; it's i = 1 * 1; i++; i++; he argues. After all it's called post-increment so it's not applied until after the statement, he knows that. Then a third guy chimes in that obviously i == 1 because the assignment would happen after the increments. This guy then promptly gets called an idiot by the other two. And the argument kept going. All my attempts at explaining that we couldn't know because it's unspecified were dismissed as ignorant. Someone had to be right.

The expectation that reality takes a certain path and no other is very deeply set with programmers. This trait is actually very important and people who lack it cannot program well until they develop it. And here I was, a heretic, telling them that C is not like that.

PleegWat

My money's on 4.

EDIT: and I was wrong. Both gcc and clang return 2 from this code:

#include <stdio.h>

int main(int argc, char * argv[])
{
    int i = 1;
    i = i++ * i++;
    printf( "%d\n", i );
}

gwowen

@gwowen said:

sizeof is a compile-time operator, and has nothing to do with the runtime values of the variables or types whose size it yields.

... but everything to do with their types. So if sizeof(x) != sizeof(an address) then x is not an address. It is an array type. The value of an array type is not its address.

In C, the definition
[code]int x[2] = {1,2};[/code]
defines an object of type array-of-int with value {1,2}.
It does not declare a pointer, it does not declare an address.

It declares an array object (whose token decomposes to a pointer-to-its-first-element in certain contexts, in a puff of bad design worthy of PHP).

Captain

No obtuse, multi-paragraph rant on how Haskell does it better? I'm thoroughly disappointed.

It's implied.

Captain

Or do you think having the result of (f(x) == g(x)) be different to (g(x) == f(x)) would be a good thing?

No, that would be horrible.

Which is why I prefer languages where values don't have side-effects unless I say they can. And then,

f(x) == g(x)

if and only if

g(x) == f(x)

Steve_The_Cynic

@gwowen said:

It makes no sense that f(T) and f(&T) are the same for one type of object (arrays), and different for every other type of object.

But they aren't the same.

[code]void f( int p ); / pointer to int /
void g( int a[] ); / pointer to int written as if it is a sizeless array. /
void gg( int an[30]); / pointer to int written as if it is a sized array. */
void h( int (pa)[20] ); / pointer to array of 20 ints */
void ff( int (pa)[30] ); / pointer to array of 30 ints */

int A[20];

f(A); /* OK /
f(&A[0]); / OK, exactly the same as the f(A) /
f(&A); / NOT OK, pointer type mismatch. /
g(A); / OK /
g(&A[0]); / OK, exactly the same as g(A) /
g(&A); / NOT OK, pointer type mismatch. /
h(A); / NOT OK, pointer type mismatch. /
h(&A[0]); / NOT OK, pointer type mismatch. /
h(&A); / OK /
ff(&A); / NOT OK, pointer type mismatch. */
[/code]

If you take the address of an array type (not the address of one of its elements, but of the array itself), you form a pointer-to-array-of-N-items. In function h(), the parameter must be dereferenced twice to get to an int.

dkf

@gleemonk said:

i=1; i = i++ * i++;

If you really want to hurt your head, try to predict it with << instead of *, i.e.:

i = 2; /* because it is more interesting */
i = i++ << i++;

(I have no idea what this will produce for C or C++. But Java produces 16 and I'd expect C# to do the same; they both lock down the semantics of expressions in the same way. )

tar

@PleegWat said:

Both gcc and clang return 2 from this code:

Do different compiler flags (-O_n_ in particular) have any efffect on the result?

PleegWat

Good question. Nope, all 2. For the i=2, i = i++ << i++, all 16 on both compilers.

dkf

OK, so that means that compiler (that you used) consistently chose a LTR evaluation policy when doing the initial construction of the virtual machine code. All the optimisation steps after that preserve the semantics that are baked in at that point normally (because otherwise you are so screwed when doing floating point code; evaluation order really matters a lot there). Indeed, most optimisations are done without knowledge of where the language states the sequence points are; they can only reorder when they can prove it is safe to do so (i.e., not in this case) and don't actually know whether this is all one expression or many with some local variables to carry intermediate results.

Optimising compilers didn't used to work that way. They do now because it is much less crazy, particularly for the compiler authors. But C and C++ continue to permit such things whereas most other languages prohibit it.

flabdablet

@gleemonk said:

The expectation that reality takes a certain path and no other is very deeply set with programmers.

In my experience, this is more true of programmers who have not ever really got their hands dirty with hardware. Watching a machine operation fail because a D flip-flop entered a metastable state because an input edge just happened to violate a setup or hold time wrt its clock edge is a powerful reminder that digital logic is, ultimately, just another leaky abstraction.

ben_lubar

I want to have a compiler that always does the wrong thing when it can.

Also, in C, is undefined behavior allowed to act at compile time or only when the program is run?

HardwareGeek

@flabdablet said:

machine operation fail because a D flip-flop entered a metastable state because an input edge just happened to violate a setup or hold time wrt its clock edge

*TWITCH*

I have on occasion spent weeks debugging such things, proving that that was actually the cause of the failure, figuring out why STA didn't predict that failure, and what needed to change in silicon to prevent future occurrences, or at least reduce them to an acceptable error rate.