Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby)
-
@boomzilla said in Breach in the defenses:
Apparently, Microsoft forked that version of unrar and incorporated the component into its operating system's antivirus engine. That forked code was then modified so that all signed integer variables were converted to unsigned variables, causing knock-on problems with mathematical comparisons. This in turn left the software vulnerable to memory corruption errors, which can crash the antivirus package or allow malicious code to potentially execute.
This strikes me as some developer with a pet peeve against signed integers getting his way and making life worse for everyone. Not that I've ever been that guy. Nuh uh.
But
ssize_t
is so much harder to type thansize_t
,Or, as Joël Romijnsen (PBUH) used to like to do, actually explicitly state the signedness of his
char
s, causing chain effects when other code tries to incorporate his, and pain when trying to reverse it.Now, in general, this isn't just a problem half the time, it can be a problem three thirds of the time, since there are in fact three types of signedness for
char
in C. For example...pjh@hpdesktop:/tmp$ cat -n ./char.c 1 #include <stdio.h> 2 3 int unadorned_char(char* c){ 4 return printf("%c\n", *c); 5 } 6 int signed_char(signed char* c){ 7 return printf("%c\n", *c); 8 } 9 int unsigned_char(unsigned char* c){ 10 return printf("%c\n", *c); 11 } 12 13 int main(void){ 14 char c='c'; 15 signed char s='s'; 16 unsigned char u='u'; 17 18 unadorned_char(&s); 19 unadorned_char(&u); 20 signed_char(&u); 21 signed_char(&c); 22 unsigned_char(&c); 23 unsigned_char(&s); 24 25 return 0; 26 } 27
Spoilers ahead...
pjh@hpdesktop:/tmp$ gcc -Wall -Wextra char.c -o char char.c: In function ‘main’: char.c:18:17: warning: pointer targets in passing argument 1 of ‘unadorned_char’ differ in signedness [-Wpointer-sign] unadorned_char(&s); ^ char.c:3:5: note: expected ‘char *’ but argument is of type ‘signed char *’ int unadorned_char(char* c){ ^~~~~~~~~~~~~~ char.c:19:17: warning: pointer targets in passing argument 1 of ‘unadorned_char’ differ in signedness [-Wpointer-sign] unadorned_char(&u); ^ char.c:3:5: note: expected ‘char *’ but argument is of type ‘unsigned char *’ int unadorned_char(char* c){ ^~~~~~~~~~~~~~ char.c:20:14: warning: pointer targets in passing argument 1 of ‘signed_char’ differ in signedness [-Wpointer-sign] signed_char(&u); ^ char.c:6:5: note: expected ‘signed char *’ but argument is of type ‘unsigned char *’ int signed_char(signed char* c){ ^~~~~~~~~~~ char.c:21:14: warning: pointer targets in passing argument 1 of ‘signed_char’ differ in signedness [-Wpointer-sign] signed_char(&c); ^ char.c:6:5: note: expected ‘signed char *’ but argument is of type ‘char *’ int signed_char(signed char* c){ ^~~~~~~~~~~ char.c:22:16: warning: pointer targets in passing argument 1 of ‘unsigned_char’ differ in signedness [-Wpointer-sign] unsigned_char(&c); ^ char.c:9:5: note: expected ‘unsigned char *’ but argument is of type ‘char *’ int unsigned_char(unsigned char* c){ ^~~~~~~~~~~~~ char.c:23:16: warning: pointer targets in passing argument 1 of ‘unsigned_char’ differ in signedness [-Wpointer-sign] unsigned_char(&s); ^ char.c:9:5: note: expected ‘unsigned char *’ but argument is of type ‘signed char *’ int unsigned_char(unsigned char* c){ ^~~~~~~~~~~~~
This basically stems from the fact that
- beyond the usual 'your're using signed when you should be using unsigned' warnings...
- compilers are free to choose what signedness to apply to unadorned chars, so...
- thus typically tend to warn when that sign is explicitly stated when an unadorned one is used...
- or vice-versa...
- irrespective of whether or not the underlying types happens to match during this phase of the moon.
-
@pjh said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
But
ssize_t
is so much harder to type thansize_t
,Huh. Before your post it never would have occurred to me to type that.
-
That's OK. Most people forget that there are three kinds of signedness for
int
in C.- Explicitly
unsigned
- Explicitly
signed
- Unspecified, which is (usually) exactly equivalent to and interchangeable with explicitly
signed
.
When is "unspecified" NOT exactly equivalent to and interchangeable with explicitly
signed
?struct funtimes { unsigned int uibf:4; /* explicitly unsigned */ signed int sibf:4; /* explicitly signed */ int ibf:4; /* unspecified == ambiguous */ } ft;
ft.ibf
is ambiguous because the standard leaves it to the implementation whether plain-int bitfields are signed or unsigned.
- Explicitly
-
This is why I never use plain types in my code. I always
#include <cstdint>
and usestd::int8_t
/std::uint32_t
/std::intmax_t
/etc. Except forchar
which is for UTF-8 string data only.
-
@lb_ said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
This is why I never use plain types in my code. I always
#include <cstdint>
and usestd::int8_t
/std::uint32_t
/std::intmax_t
/etc. Except forchar
which is for UTF-8 string data only.This is why I avoid the problem entirely by not using C++.
-
@steve_the_cynic said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
the standard leaves it to the implementation whether plain-int bitfields are signed or unsigned
The standard leaves a lot more to the implementation in relation to bitfields too. :(
-
C++ and signedness is a mess. For instance, who on earth decided that bitwise
&
between twounsigned chars
shall produce an (implicitly signed)int
? By definition, bitwise "and" treats all of the bits in the operand the same way, so it shouldn't even be defined on signed types.
-
@ixvedeusi said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
C++ and signedness is a mess. For instance, who on earth decided that bitwise
&
between twounsigned chars
shall produce an (implicitly signed)int
? By definition, bitwise "and" treats all of the bits in the operand the same way, so it shouldn't even be defined on signed types.Except that by the time they reach the bitwise
&
, they have already been promoted tosigned int
by the default promotion rules. Um. Unlessint
andchar
are the same size(1), in which case I don't remember whether it would promote tounsigned int
.(2)(1) All those "at least as big as"es in the standard do permit this, but it does imply that
char
would have to be at least 16 bits, and the amount of code that won't cope with a not-eight-bitchar
is ... scary.(2) A quote from the standard that I found does indeed say that the behaviour is ... implementation specific, since the promotion from
char
andshort
toint
types goes tosigned int
if the actual value can be preserved, but tounsigned int
if it cannot. Ifchar
andint
are the same size, thenunsigned char
promotes tounsigned int
, while in the more common cases wherechar
is 8 andint
is 16 or 32 bits, it will besigned int
.So ... Just by looking at the code given, you cannot inherently tell which data types it will use. Nice.
-
@ixvedeusi said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
C++ and signedness is a mess. For instance, who on earth decided that bitwise
&
between twounsigned chars
shall produce an (implicitly signed)int
? By definition, bitwise "and" treats all of the bits in the operand the same way, so it shouldn't even be defined on signed types.Bitwise operators should completely ignore signedness. How the caller uses the operands and results doesn't matter.
Except that by the time they reach the bitwise
&
, they have already been promoted tosigned int
by the default promotion rules. Um. Unlessint
andchar
are the same size(1), in which case I don't remember whether it would promote tounsigned int
.(2)(1) All those "at least as big as"es in the standard do permit this, but it does imply that
char
would have to be at least 16 bits, and the amount of code that won't cope with a not-eight-bitchar
is ... scary.(2) A quote from the standard that I found does indeed say that the behaviour is ... implementation specific, since the promotion from
char
andshort
toint
types goes tosigned int
if the actual value can be preserved, but tounsigned int
if it cannot. Ifchar
andint
are the same size, thenunsigned char
promotes tounsigned int
, while in the more common cases wherechar
is 8 andint
is 16 or 32 bits, it will besigned int
.So ... Just by looking at the code given, you cannot inherently tell which data types it will use. Nice.
Char being a numeric type makes no sense. The signedness of char makes no sense either. Char should be a completely separate type with no implicit conversions to or from real numeric types. C/C++ types are a mess, with their platform-dependent sizes.
-
@khudzlin Are there actually implicit conversions with chars? Explicit ones I could get behind, but implicit char conversions makes no damn sense whatsoever.
-
@pie_flavor There are fortunately no implicit conversions to char in any language I know (even in languages where narrower integral types exist, such as Java and C#). But there are implicit conversions from char to wider types. Those shouldn't exist either, because you shouldn't do math with chars.
-
@pie_flavor said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
@khudzlin Are there actually implicit conversions with chars? Explicit ones I could get behind, but implicit char conversions makes no damn sense whatsoever.
As far as C (and thus C++) is concerned, 'char' is just another1 numeric type, with all the implicit conversions which go with it.
1Almost. It is the only type for which the 'plain'
char
is neither asigned char
nor anunsigned char
, and there's the exception concerning aliasing.
-
@khudzlin said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
Bitwise operators
shouldby definition completely ignore signedness.Which is why they should not be allowed on signed types, because ignoring signedness on signed types does not make sense.
-
@ixvedeusi Signedness is interpreting the bits. Bitwise operators don't need to care about how you came up with the operands you give them, nor what you do with the result they give you.
-
@khudzlin said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
@ixvedeusi Signedness is interpreting the bits. Bitwise operators don't need to care about how you came up with the operands you give them, nor what you do with the result they give you.
Yes, and declaring a value as signed means "The bits in this value have a special meaning". Thus doing bitwise operations on them makes no sense. Which is my point. If you want to do bitwise operations on a signed type, tell the compiler "I don't give a fuck about the special meaning of bits" by casting it to unsigned first.
-
The three chars is one of the LESS crazy things about C++. They all serve their purpose:
signed char
is signed 1-char-wide numberunsigned char
is unsigned 1-char-wide numberchar
is a character and if you're treating it as number, you're
This is an attempt to fix the original C's terrible, terrible decision to make a character type and a byte-sized number type one and the same. Not the best fix, but better than nothing IMO.
-
@gąska said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
char is a character and if you're treating it as number, you're
Except that the compiler itself treats it as a freaking number, happily casting it all over the place implicitly. And because that's not enough of a mess, on most platforms it's signed, meaning you do have to consider it a number because signedness is a property specific to numbers.
-
@khudzlin said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
@pie_flavor There are fortunately no implicit conversions to char in any language I know (even in languages where narrower integral types exist, such as Java and C#). But there are implicit conversions from char to wider types. Those shouldn't exist either, because you shouldn't do math with chars.
As far as C# is concerned, it will happily promote
char
toushort
orint
without a cast, but not toshort
. I haven't exactly been inconvenienced by this, but I have been inconvenienced by C# obstinately promotingbyte
andushort
to(u?)int
before applying bitwise operators, which are not defined for types smaller thanint
.
-
@gąska said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
This is an attempt to fix the original C's terrible, terrible decision to make a character type and a byte-sized number type one and the same.
That was freaking inspired for 1970, given the awfulness in many other languages of the era…
-
@medinoc said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
As far as C# is concerned, it will happily promote
char
toushort
orint
without a cast, but not toshort
. I haven't exactly been inconvenienced by this, but I have been inconvenienced by C# obstinately promotingbyte
andushort
to(u?)int
before applying bitwise operators, which are not defined for types smaller thanint
.char
is unsigned, which is why .Net will convert it toushort
, but notshort
. I'm assuming C# promotes to int, because signed is the default for int in .Net, and changing signedness seems more likely to be inconvenient (just checked, it's indeed int rather than uint).
-
@ixvedeusi said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
@gąska said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
char is a character and if you're treating it as number, you're
Except that the compiler itself treats it as a freaking number, happily casting it all over the place implicitly. And because that's not enough of a mess, on most platforms it's signed, meaning you do have to consider it a number because signedness is a property specific to numbers.
These problems are problems only if you use char as a number. If you don't, they all go away. Signedness doesn't matter if you only ever copy it unchanged or use lookup tables.
-
@gąska said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
These problems are problems only if you use char as a number. [...] Signedness doesn't matter if you only ever copy it unchanged or use lookup tables.
One does not imply the other, here. There are other things you might conceivably want to do with raw bytes, such as assembling them into an integer, checking the value of specific bits, matching for specific patterns etc. This is not "treating them as a number", except in the buggered-up C type system, where everything is a freaking number and all available operations assume you are working on numbers.
-
@ixvedeusi said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
There are other things you might conceivably want to do with raw bytes
Raw bytes are
unsigned char
(oruint8_t
), notchar
.
-
@khudzlin said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
Char being a numeric type makes no sense. The signedness of char makes no sense either. Char should be a completely separate type with no implicit conversions to or from real numeric types. C/C++ types are a mess, with their platform-dependent sizes.
This is one of my big gripes about C#, it chose to inherit this madness from C/C++ where you can subtract a char from an int and get an answer even though in a strongly-typed language that should obviously be an error.
At least C# properly considers
char
to be a character, and this contains enough bytes to hold a reasonable selection of unicode characters. Unlike C/C++ which are hopelessly out-of-date and thesizeof(char)
has NOTHING to do with the size of a character.
-
@medinoc said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
As far as C# is concerned, it will happily promote char to ushort or int without a cast, but not to short.
If you're going to treat
char
as a number (which is retarded), you should at least treat it as an unsigned number.In which case, that behavior makes sense-- you can't put a unsigned 2-byte value in a signed 2-byte value without information loss. There ain't enough bits. You can however put it in an int without information loss.
-
@gąska said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
Raw bytes are unsigned char (or uint8_t), not char.
I thought you'd say that. However for one,
@gąska said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
unsigned char is unsigned 1-char-wide number
A raw byte is not a freaking number.
Also, any and all standard library functions conflate the two and use
char
for raw bytes (malloc comes to mind, what does that return if not raw bytes?). C and C++ have no notion of a character (whenever they say "character" what they mean is "raw byte") and thus using unsigned char for raw bytes opens up a whole other mess. And of course, iostreams treat three of char, signed char and unsigned char as characters so using any of them for anything else will produce unintended effects when used with iostreams.
-
@steve_the_cynic said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
the amount of code that won't cope with a not-eight-bit
char
is ... scary.You can rest assured that I check
CHAR_BIT
in my code.@khudzlin said in Sign your char, across my code (Was Re: Breach in the defenses, with apologies to Terence Trent D'Arby):
Char should be a completely separate type with no implicit conversions to or from real numeric types.
I'd like to introduce you to
std::byte