An amusing rant about C

Benjamin Hall

Aria Beingessner

C Isn't A Programming Language Anymore - Faultlore

Now I have no expertise here. But it was an interesting read. And if it's half as bad as he represents...wow. glad I'm just a dirty high level application level developer who doesn't have to worry about most of that.

topspin

@Benjamin-Hall I’ve not even gotten halfway through (I’m tired and my eyes are getting weak, but I’ll get back to the rest), but it starts right off pretty stupid. I mean, it sounds like he knows what he’s talking about, but it’s still stupid.
What do you mean, Linux doesn’t have language bindings for the language you just made up? And the Linux programmers aren’t going to provide one for you either? Wow, guess what, you’ll have to provide that yourself. What a stupid scenario.
And then he complains that FFIs have to speak C because that’s the “protocol”, but C doesn’t have an ABI. (I think he uses “protocol” here to avoid the obvious use of ABI, because that’s be immediately self-contradicting) Um, no, you don’t have to speak C, you have to speak the system library’s ABI. Which it obviously has. And C is used everywhere because it’s pretty simple. Makes me wonder if he’d prefer if the system spoke something more language agnostic, like COM. Eww.

Now back to seeing if the rest of the article makes my post look redundant or dumb…

Benjamin Hall

@topspin I think, and I'm not expert, that what he means is

You have to use C-compatible FFIs, because that's all the OSs allow for. Which makes C a protocol for interacting.
But C sucks at this job. Horribly. Horribly. Because

It's impossible to parse the headers without basically just delegating this job to the system's compiler. And you have to pick the right one.
Without an ABI (which C doesn't really have), you end up having to hard-code type mappings. Which then fails as soon as you do anything significant or someone uses it unexpectedly. Which means that the C types can never change (or that any change is breaking).
If you do everything through opaque types, great. But there are too many types that aren't opaque and can't be opaque.

Basically, the point is that if C is going to be the defacto standard for talking to the OS, C needs to change (to add an actual ABI and fixed, standardized types) and then never change again. It's not a language at that point, it's a protocol for other languages to speak. And as such needs to be treated differently.

If I'm understanding him right. But I'm just a dumb application dev without any formal training. Who knows just enough about C and C++ to never want to get it anywhere near me if I can avoid it.

Watson

I wonder how much processor hardware design is influenced by the expectation that it's going to be running something that behaves like it came out of a C compiler? "That's the way C programs work so we'll build our chips around it for performance."

cvi

@Benjamin-Hall said in An amusing rant about C:

You have to use C-compatible FFIs, because that's all the OSs allow for.

Technically, you can bypass the C layer in Linux - you can do syscalls yourself, e.g. via interrupts/syscall or whatever mechanism the underlying architecture provides. (There's the special case of using the vDSO to bypass that for performance reasons, in which case you're back to calling a C function...)

How you do a syscall is documented e.g., here (or man 2 syscall). That just talks about instructions and registers.

Now, yeah, you don't get rid of C entirely there either, since you might need to pass data through memory (e.g., structs), and I'm actually not sure if those are documented in any way other than providing C declarations. But you can definitively do stuff like basic IO without ever touching a C structure.

Linux devs have been quite ic about keeping the kernel's userspace API/ABI (e.g., via the syscalls) stable. (And at that level, at least one part of the triple that they talk about is gone -- namely the userspace clib part.)

HardwareGeek

@Watson I think it has more to do with the expectation that it needs to run existing software; therefore, the hardware has to support existing ABIs. Or else there has to be an emulation layer, which effectively means the hardware will run more slowly, which means fewer people will want to use your nifty new hardware. If you introduce a well-optimized compiler that can compile and link existing source with zero changes, and you introduce it at the same time as the hardware, that might not matter for some use cases. But if your hardware is intended for general-purpose computing, people are most certainly going to want to run existing applications on it, and your new architecture had better offer some major improvement over x64, including enough speed to run the emulation layer if it needs one.

Gąska

@Benjamin-Hall said in An amusing rant about C:

Now I have no expertise here.

I do. This is all true. It all really is that horrible. Actually, it's worse. The post skips over many tiny yet important details.

And then he complains that FFIs have to speak C because that’s the “protocol”, but C doesn’t have an ABI. (I think he uses “protocol” here to avoid the obvious use of ABI, because that’s be immediately self-contradicting) Um, no, you don’t have to speak C, you have to speak the system library’s ABI. Which it obviously has.

The point is that there isn't a C ABI. There are - at a minimum - 176 different ABIs. And FFI protocols (as in, the canonical POSIX spec, the canonical WinAPI spec, the canonical OpenGL spec, etc.) aren't defined in terms of an ABI, they're defined in terms of C headers. Which can mean any of the 176 ABIs depending on a platform and a particular compiler used, and are also virtually unparseable. And x86_64-pc-windows-msvc ABI has 4 sub-ABIs in form of calling conventions. And other platforms aren't any better. Did you know you can change the default calling convention through compilation flags?

@Watson said in An amusing rant about C:

I wonder how much processor hardware design is influenced by the expectation that it's going to be running something that behaves like it came out of a C compiler?

It doesn't matter because the whole point of compilation to machine code is for it to not matter.

@cvi said in An amusing rant about C:

Linux devs have been quite ic about keeping the kernel's userspace API/ABI (e.g., via the syscalls) stable.

Lolnope. There's a reason they insist(ed?) on all drivers to be integrated into kernel's main code repo. Though they still have one of the stablest ABIs in the world. The #1 stablest one is probably Windows. Which is quite an achievement, since I'm not talking just about the kernel - I'm talking about the whole Windows. (At least until 2015 it was.)

djls45

@Benjamin-Hall said in An amusing rant about C:

Aria Beingessner

C Isn't A Programming Language Anymore - Faultlore

Now I have no expertise here. But it was an interesting read. And if it's half as bad as he represents...wow. glad I'm just a dirty high level application level developer who doesn't have to worry about most of that.

This is a stupid complaint by an author who doesn't understand what she's actually complaining about.

The "problem" she's describing is simply that C is very highly portable, and that you can compile C to any of its targets using any hardware architecture (hence the "huge" list of 176 ABIs).

OS ABIs are all in C? That's because they're written in C. If you want a way to call the OS's functions, you have to use the OS's method of calling them. If you want your new language (and its compiler) to pass parameters some other way, that's fine; you can do so. But you can't then complain that your language isn't compatible with any of the modern OSes.

Integer types aren't well-defined in C? That's because integer types aren't well-defined across all processor architectures. The "problems" with int and its relatives are not with the original definition of the language and its types; the problem is moreso with programmers who wrote their code assuming that int would be the same across all architectures and platforms, and then didn't want to fix their code's portability problem when int on a different architecture turns out to be a different size. In actuality, those programmers were writing Assembly-in-C code, not C code.
(Edit: this ability to write Assembly-in-C is also why there's so much "undefined behavior" and "implementation-defined behavior" in C: those behaviors will change on different architectures/compilers because different architectures/compilers work differently as their developers made different choices from one another.)

You want to write a language as portable as C? Well, then either you have to write all the different possible compilation targets that C supports or you have to write your "transpiler" to output C code and pass it to a C compiler. One of these involves a whole lot more work than the other, so guess which one nearly everyone picks?

Just to be absolutely clear: these are not C's problems! They are exactly why C has become the de facto standard in programming since it came out in the early 70's.

Gąska

@djls45 said in An amusing rant about C:

This is a stupid complaint by an author who doesn't understand what she's actually complaining about.

This author literally wrote a tool to automatically detect ABI incompatibilities. Beyond that, they're a long time contributor to the Rust compiler and the Rust project in general; they wrote most of the official documentation of unsafe Rust features; as far as I can tell they're a paid Mozilla employee (Mozilla is in charge* of Rust); and they seem to have a lot of friends close to C and C++ standards committees.

But sure, go on.

Integer types aren't well-defined in C? That's because integer types aren't well-defined across all processor architectures.

Didn't stop Rust There's u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize, and that's it. No shorts, no longs. Just fixed size integers + one pointer-sized type.

the problem is moreso with programmers who wrote their code assuming that int would be the same across all architectures and platforms, and then didn't want to fix their code's portability problem when int on a different architecture turns out to be a different size. In actuality, those programmers were writing Assembly-in-C code, not C code.

Look - when I make a program, I want it to behave in a certain way. Ideally, I want the behavior to stay the same, regardless of where, when, or how I compile it. The more things are left up for the compiler to decide, the less trust I have in my own code and the more effort I have to spend making sure identical code actually works identically on all platforms. The more my tools get in the way of getting the actual work done, the shittier they are. And C is at the extreme end of getting in the way of cross-platform coding.

Is this shittiness justifiable? Yeah, for the most part it is. But justifiable shittiness is still shittiness. C is the worst language for cross-platform development specifically because of all those features you mentioned that were intended with cross-platform development. Unfortunately, the designers of C missed the mark completely and in nearly every case did the exact opposite of what would be actually helpful for cross-platform development. "I want my file format's version indicator to take twice as many bytes on 64-bit architectures" said no one ever.

Sure, I can move everything to uint32_t and friends. In my code. I can't do the same with the standard library. And almost nothing in the standard library uses fixed-size types. So I have to deal with variable-sized integers whether I want it or not. (Edit: also, a recommendation to use uint32_t wherever possible is itself an admission variable-sized integers were a mistake.)

You want to write a language as portable as C? Well, then either you have to write all the different possible compilation targets that C supports or you have to write your "transpiler" to output C code and pass it to a C compiler. One of these involves a whole lot more work than the other, so guess which one nearly everyone picks?

Out of popular commercially-viable compilers, every single one went with the former. I wonder why that is.

Just to be absolutely clear: these are not C's problems!

No, strictly speaking it isn't. But they are problems of every interface that relies on C API. Every single one of them. And there are a lot of very important C APIs in use. So they became everybody's problems instead.

cvi

@Gąska said in An amusing rant about C:

Lolnope. There's a reason they insist(ed?) on all drivers to be integrated into kernel's main code repo. Though they still have one of the stablest ABIs in the world. The #1 stablest one is probably Windows. Which is quite an achievement, since I'm not talking just about the kernel - I'm talking about the whole Windows. (At least until 2015 it was.)

I am specifically talking about the kernel interface via syscalls -- and Windows doesn't make any guarantees about that. I.e., on Windows you have to go via the provided .dlls (kernel32.dll / ntdll.dll or whatever), as they consider themselves free to change the syscalls aorund.

So, the article is correct when it comes to Windows -- you really need to deal with a C API. Not so much on Linux. (But you might want to anyway, since it actually makes your life easier..)

cvi

@Gąska said in An amusing rant about C:

Edit: also, a recommendation to use uint32_t wherever possible is itself an admission variable-sized integers were a mistake.

"Variable-sized" integers have their uses. It's more that C/C++'s particular choices of sizes are flaming garbage. Case in point: the type that people are normally told to go for, and that is one of the shortest to type, int is shit, because it doesn't actually represent an unseful choice for the most common platforms these days. (And long isn't a very useful type: it can be both too small (32-bits) or too large (32-bits), depending on what you target.)

(The other mistake that C/C++ makes is tying overflow/underflow behaviour to signedness. Why? .)

Watson

@Gąska said in An amusing rant about C:

@Watson said in An amusing rant about C:

I wonder how much processor hardware design is influenced by the expectation that it's going to be running something that behaves like it came out of a C compiler?

It doesn't matter because the whole point of compilation to machine code is for it to not matter.

So the processor manufacturer creates chips with registers for things like "stack pointer" and "string operation support" and "accumulator" without any expectation about how they are to be used (so their operation is completely orthogonal)? And if they do have expectations, where did they come from that they think it's worth having those things? Is there really that much variation in what different compilers produce for a given processor (the very existence of which is a matter of convention rather than necessity), or are their options constrained by the architectures of their targets, and so back to the question of why those architectures.

cvi

@Watson FWIW- I've seen ISA manuals that specifically mention that some of the addressing modes exist to support C more efficiently. IIRC it was just accesses though a pointer with a hard-coded offset, so nothing super crazy. (You'd probably end up with something similar if it were for a different modern-ish language with something like C structs.)

PleegWat

@cvi said in An amusing rant about C:

(The other mistake that C/C++ makes is tying overflow/underflow behaviour to signedness. Why? .)

As I understand it, because two's complement was not universal yet when the spec was originally written.

Gąska

@cvi said in An amusing rant about C:

@Gąska said in An amusing rant about C:

Edit: also, a recommendation to use uint32_t wherever possible is itself an admission variable-sized integers were a mistake.

"Variable-sized" integers have their uses.

Nowadays? I don't think there's a single platform combo where you're likely to deploy code on both and actually benefit from having a smaller int on one of them.

It's more that C/C++'s particular choices of sizes are flaming garbage. Case in point: the type that people are normally told to go for, and that is one of the shortest to type, int is shit, because it doesn't actually represent an unseful choice for the most common platforms these days.

4-byte int is still the most performant to work with on x64. Caches are hell of a drug.

(The other mistake that C/C++ makes is tying overflow/underflow behaviour to signedness. Why? .)

Because there's only one way to implement unsigned integers, but many to implement signed. And saturating unsigned math is rarely useful.

dkf

@PleegWat said in An amusing rant about C:

@cvi said in An amusing rant about C:

(The other mistake that C/C++ makes is tying overflow/underflow behaviour to signedness. Why? .)

As I understand it, because two's complement was not universal yet when the spec was originally written.

There were some seriously weird architectures out there originally, and IBM weren't responsible for all of them. Also, a platform is free to define the behaviour of signed integer overflow/underflow.

You still want to avoid it most of the time.

dkf

@Gąska said in An amusing rant about C:

And saturating unsigned math is rarely useful.

I've got code that uses it to implement a kind of accumulator in fixed point. (Fixed point has a lot in common with integer math; it's implemented as integer math with some shifts.)

Gąska

@dkf fixed point math implementation is on my list of things I hope to never have to do in my career.

dkf

@Gąska We use it in an embedded context where we don't have hardware floating point. It's not much more difficult than floats, except you need to take care about ensuring that all the numbers involved have the same sort of scale. And they're both faster and energy-cheaper than floats, which matters a lot to us.

The place you're most likely to see that sort of thing in consumer devices is deep in the guts of video decoders. Yes, those are probably in hardware, but that hardware probably uses fixed point.

dkf

@topspin said in An amusing rant about C:

And then he complains that FFIs have to speak C because that’s the “protocol”, but C doesn’t have an ABI.

It's one of these things that varies quite a bit. Platforms have ABIs, and libraries have ABIs, but languages do not (in general). That said, some of the types in C are more ABI friendly than others; uint32_t is pretty great, and intmax_t is not. Structure packing is another thing that can vary, but fortunately not too much; the need to handle working with networked binary protocols (formally not ABIs but with a lot in common) has eased a lot of the pain by tying implementers' hands.

When someone says that they want their code to work exactly the same everywhere, they're in for a bad time when they find out just how variable “everywhere” really is. Even if you leave out the truly crazy architectures, such as trinary computers.

PleegWat

@dkf said in An amusing rant about C:

fixed point

Part of what we do involves extracting timing measures. These are implemented using integer math on milliseconds rather than floats on seconds or the fiddliness involved in using struct timeval or struct timespec.

topspin

I've read the rest now, and I stand by my earlier impression. The author definitely knows what they're talking about (I even linked to their mentioned article about swift ABI a few weeks ago), but it's still a confused article with no clearly laid out problems or solutions, IMO.

What I think I've identified as complaints:

System library calls need to be ABI stable (to support user-space programs)
C needs to be ABI stable (to support C programs)
System library ABI is usually described using its C API
Different platforms have different ABIs

Based on my maybe incomplete understanding I then got:

You don't say . Not a C problem. Also, the author praises some mechanisms for forward comptability written in C.
The same problem exists for different languages.
C++ is very hesistant to break ABI and that's completely independent of FFI, because nobody sane does FFI with C++. That mostly seems to be made a problem because gcc shies away from breaking (libstdc++) ABI as all hell broke loose when they broke string ABI once for C++11, but on Windows MSVC actually broke ABI every major release and they only recently stopped doing that. That's because on linux the assumption is there's only one system library and you're allowed to pass objects over library boundaries (or rather: what's a library boundary?), whereas on Windows it wasn't. Both have advantages and disadvantages
That seems to be the major gripe here. I'm not sure how overblown it is, but I guess technically you could describe the library ABI with a different description. And you'd only have to do this once, too, because it's supposed to be stable. So then every other language could use that description. But I don't really see, e.g., Python's FFI use P/Invoke signatures or swift use rust's generated bindings.

Well, obviously. There's no way around that unless you want to decree that "everyone do as x86_64 does". Or as arm does. The reason why gcc lists so many triplets is because it supports so many platforms. Your compiler for Babbyscript will have to deal with that shit anyway if its targeting a 16 bit segmented memory architecture with weird function call syntax and whatnnot. Your saving grace is probably "we don't support those anyway", but then it's an unfair complaint.
What's probably true is that there's unneccessary differences between platforms that stem from C's variable integer sizes, but you need have some of those anyway, at the very least because pointers are differently sized. But that's only the surface of platform specific ABI.

And the remark that Windows does so much better seems weirdly tangential, since it's also described using C headers, which are pretty abominable too. That, or COM/MIDL/whatever shit, which isn't exactly nice to use either and still based on Windows' interpretation of C types. Also @Gąska's remark that Windows is more stable: that's because they have like only 2 of those triplets total, x86 and x64 (not counting arm), the rest is already fixed to be windows and msvc. And you already have to special case "Windows" vs "Unix" anyway in your language's platform abstraction.

dkf

@topspin said in An amusing rant about C:

the rest is already fixed to be windows and msvc

Oh dear sweet summer child!

There are other runtime libraries and they're quite different, though not normally at the type/structure layout level (because any sane C runtime on Windows needs to interact with the OS, and that's pretty fixed). But command line parsing is very much more variable than that, so if you want to make a function that just lets you pass a bunch of arbitrary strings to a subprocess, you've got some really weird problems ahead of you as you try to navigate the forest of bugs. That's something where the Unix approach — caller identifies the argument boundaries, not callee — is just so much easier.

HardwareGeek

@Watson said in An amusing rant about C:

So the processor manufacturer creates chips with registers for things like "stack pointer" and "string operation support" and "accumulator" without any expectation about how they are to be used (so their operation is completely orthogonal)?

No, not completely, anyway. Even in a general register architecture (r0-r15, or whatever, not accumulator-based), at least the stack pointer register is always special, because it's automatically incremented/decremented by CALL/RET. There are different architectures (e.g., stack-based), but the manufacturer certainly has expectations about how they'll be used. I suppose there could be an architecture in which you can tell the processor to use a general purpose register as the stack pointer, but would be unusual, and I'm not familiar with any such processors.

HardwareGeek

@Gąska said in An amusing rant about C:

4-byte int is still the most performant to work with on x64.

Not everything is x64. 8-bit microcontrollers are still rather common, because that's all the computing power some applications need.

PleegWat

@HardwareGeek Yup, I learned recently (well, last year or so) that famous processors from the 80s like the 6502 still have close variants in production to this day.

Gąska

@HardwareGeek said in An amusing rant about C:

@Gąska said in An amusing rant about C:

4-byte int is still the most performant to work with on x64.

Not everything is x64.

I know. But not in every architecture the machine-word-sized int is the fastest, and x64 is an example of that. It also happens to be a very common architecture.

dkf

@HardwareGeek said in An amusing rant about C:

@Watson said in An amusing rant about C:

So the processor manufacturer creates chips with registers for things like "stack pointer" and "string operation support" and "accumulator" without any expectation about how they are to be used (so their operation is completely orthogonal)?

No, not completely, anyway. Even in a general register architecture (r0-r15, or whatever, not accumulator-based), at least the stack pointer register is always special, because it's automatically incremented/decremented by CALL/RET. There are different architectures (e.g., stack-based), but the manufacturer certainly has expectations about how they'll be used. I suppose there could be an architecture in which you can tell the processor to use a general purpose register as the stack pointer, but would be unusual, and I'm not familiar with any such processors.

ARM is such an architecture; the return address goes into a register (r14) when you call, and it is up to the function to save that. (Interrupts don't use the stack.)

I can't remember what RISC-V does.

Gąska

@topspin said in An amusing rant about C:

What I think I've identified as complaints:

System library calls need to be ABI stable (to support user-space programs)

C needs to be ABI stable (to support C programs)

System library ABI is usually described using its C API

Different platforms have different ABIs

You forgot 5) C is ridiculously hard to work with when all you care about is having a particular ABI, which wouldn't be a problem except that the world had settled on using C as the specification language for all cross-language APIs. Much like with JS, they could've chosen anything they wanted, and they chose the absolutrly worst possible option.

And the remark that Windows does so much better seems weirdly tangential

Yeah, that's how tangents work.

Benjamin Hall

@Gąska said in An amusing rant about C:

And the remark that Windows does so much better seems weirdly tangential

Yeah, that's how tangents work.

You know what's really weirdly tangential?

The road sign on my way to the freeway, announcing "Entering Tangent". I didn't think tangents worked that way.

HardwareGeek

@dkf said in An amusing rant about C:

ARM is such an architecture; the return address goes into a register (r14) when you call, and it is up to the function to save that. (Interrupts don't use the stack.)

Huh. I probably knew that at one time, but I'd forgotten it. It's been years since I had to deal with ARM at the register level, and even then I was mostly concerned with the ip (where in the code is the CPU executing?) and ld instructions (what is the CPU reading (usually memory-mapped I/O) that is putting bad data into the system?).

And even so, r14 is special, because the hardware automatically puts the return address there.

dkf

@Gąska said in An amusing rant about C:

they chose the absolutrly worst possible option

Except too many of the other options look even worse.

In terms of an ABI, what you care about really are calling conventions, memory layouts of types, and protocols for use (when they're non-trivial such as with callbacks or memory cleanup). The main types of compatibility are backward compatibility and forward compatibility: the difference is in which side is updated. (There are more convoluted scenarios when there's more entities being updated, but they get stupid complicated and the usual fix there is to not allow arbitrary combinations of things.)

Calling conventions tend to be the easiest, as you generally just have to match whatever the platform dictates. Using alternates there is just painful.

Memory layouts are trickier. The article was a complaint about these.

Protocols are the trickiest however; handling them correctly either requires non-trivial (typically custom-written) stuff in the binding library or ends up with being outright visible to the user. It's stuff like this that makes COM extremely tricky to work with in some languages.

Gąska

@dkf said in An amusing rant about C:

@Gąska said in An amusing rant about C:

they chose the absolutrly worst possible option

Except too many of the other options look even worse.

After you exclude all the non-starters because they don't cover all platforms you need, what remains doesn't look so bad. Note that the reason C supports so many platforms is because by the 90s everybody and their dog settled on C as the "lingua franca" so support for C everywhere was critical. If something else was chosen, then this something else would be able to be compiled on all platforms.

In terms of an ABI, what you care about really are calling conventions, memory layouts of types, and protocols for use (when they're non-trivial such as with callbacks or memory cleanup).

And C is super finicky with all three.

TwelveBaud

@HardwareGeek said in An amusing rant about C:

But if your hardware is intended for general-purpose computing, people are most certainly going to want to run existing applications on it, and your new architecture had better offer some major improvement over ~~x64~~ x86, including enough speed to run the emulation layer if it needs one.

See: Itanium, a.k.a. Itanic

dkf

@Gąska said in An amusing rant about C:

And C is super finicky with all three.

All languages are finicky with all of them.

You can hide details in the FFI mapping layer, but some things don't hide nicely. Doing a good mapping often includes writing some of the code yourself, especially for non-trivial interfaces. A typical criticism of a generated interface is that yes, you've got an FFI, but it still feels like calling C; fixing that requires some actual understanding and nobody's put that much smarts in their tooling because debugging Malfunctioning Maximum Magic is for fools.

This loop has been round a number of times. ABIs designed for long term support have to be designed for it (often through the various techniques for structure versioning) and the result is still often really quite awkward. The way to deal with the awkwardness is a thicker binding layer (including hand-written bits; that's what many scripting languages do) but that comes at some cost to use of the ABI, and very often you can't hide everything. It's very hard to get right. Changing how the ABI is described is unlikely to make things much better.

Gribnit

The protobuf descriptor language is about the closest thing to an alternative for definitions of this specificity.

cvi

@Gąska said in An amusing rant about C:

Nowadays? I don't think there's a single platform combo where you're likely to deploy code on both and actually benefit from having a smaller int on one of them.

Android? Still plenty of 32-bit devices out there, but 64-bit devices aren't uncommon either.

4-byte int is still the most performant to work with on x64. Caches are hell of a drug.

That's an optimization, and you should make a conscious decision to perform this optimization as a programmer, knowing that by doing so you declare that the value is never going to exceed 2G or 4G.

Besides, there are plenty of cases where

the integer is never going to touch memory, but just lives in a register, so who cares about caches
the integer is already 64-bits in memory (e.g. container sizes etc), and you're just unnecessarily downsizing it to 32-bits
the integer should actually be 64-bits when possible, because you're dealing with something like file IO, were a few giga-elements isn't uncommon
the integer is stored in memory, but followed by a 64-bit element (e.g., a pointer), so you get an extra 32 bits of padding regardless
etc.

If you're designing a data structure that is in memory and that potentially has a few million element, yeah, then keeping the size down makes a ton of sense. You're probably still better of using something that isn't int, e.g., something like the int32_t that you suggested earlier.

Because there's only one way to implement unsigned integers, but many to implement signed. And saturating unsigned math is rarely useful.

Unsigned integers can

wrap
saturate
signal

on over-/underflow (maybe more). All three behavious are useful on occasion. Same for signed integers. Not sure what makes unsigned integers different in this respect.

Gąska

@dkf said in An amusing rant about C:

@Gąska said in An amusing rant about C:

And C is super finicky with all three.

All languages are finicky with all of them.

Some significantly less than others. For example, there are very few, if any, languages developed after 2000 where the size and alignment of int isn't always the same. And controlling size and alignment of objects is like 98% of the work involved in keeping ABIs stable.

cvi

@topspin said in An amusing rant about C:

Also @Gąska's remark that Windows is more stable: that's because they have like only 2 of those triplets total, x86 and x64 (not counting arm), the rest is already fixed to be windows and msvc.

You might want to count /MT, /MTd, /MD, /MDd et al. as different runtimes. (Not sure if this is what @dkf meant in his reply.)

topspin

@cvi said in An amusing rant about C:

@topspin said in An amusing rant about C:

Also @Gąska's remark that Windows is more stable: that's because they have like only 2 of those triplets total, x86 and x64 (not counting arm), the rest is already fixed to be windows and msvc.

You might want to count /MT, /MTd, /MD, /MDd et al. as different runtimes. (Not sure if this is what @dkf meant in his reply.)

I wasn't talking about runtimes but platform triplets. And that Windows has fewer because Windows is 1) always windows, 2) always msvc (or compatible), 3) almost always x86 or x64 isn't actually that remarkable an achievement.

cvi

@dkf said in An amusing rant about C:

Except too many of the other options look even worse.

I appreciate that e.g. Vulkan lists its API in a machine readable format (XML). Means you can generate bindings easily. Think OpenGL does the same thing these days.

cvi

@topspin Last component of the platform triples in the article refers to the runtime?

E.g. armv7-unknown-linux-musleabi... musl is an alternative libc.

topspin

@cvi overlooked that, my bad.

Gąska

@cvi not to mention -mingw32 and -mingw64 variants, and whatever clang spits out. Also, not what I was talking about at all. I meant that they kept excellent job of making sure that over 30 years and thousands of updates, all WinAPI, COM, and other interfaces across all Windows releases have all their millions of functions stay with the same ABI, so that Windows 95 apps can still be run on Windows 10 machines, even ones that dig deep into Windows internals.

Benjamin Hall

@Gąska said in An amusing rant about C:

even ones that dig deep into Windows internals.

And do unspeakable things to them.

As opposed to Apple, which goes out of its way to break backward compatibility/stability. Upgrade or die.

cvi

@Gąska said in An amusing rant about C:

Also, not what I was talking about at all. I meant that they kept excellent job of making sure that over 30 years and thousands of updates, all WinAPI, COM, and other interfaces across all Windows releases have all their millions of functions stay with the same ABI, so that Windows 95 apps can still be run on Windows 10 machines, even ones that dig deep into Windows internals.

Yes, I'm aware. I'm saying the same is true for Linux on the syscall level. E.g. this table is for Linux 2.2 (the oldest I could easily find) and the first 190 syscalls are still valid on x86 today. Hence, if you were to target Linux on a syscall level, you'd shave off the need to deal with a C API and C's calling conventions.

(In contrast, yes, Windows does provide a very stable C interface, but it does not -as far as I've read- guarantee stability w.r.t. syscalls. So, on Windows, you indeed do need to deal with going through a C interface to do anything.)

topspin

@Gąska said in An amusing rant about C:

@cvi not to mention -mingw32 and -mingw64 variants, and whatever clang spits out. Also, not what I was talking about at all. I meant that they kept excellent job of making sure that over 30 years and thousands of updates, all WinAPI, COM, and other interfaces across all Windows releases have all their millions of functions stay with the same ABI, so that Windows 95 apps can still be run on Windows 10 machines, even ones that dig deep into Windows internals.

Yes, but they do all of this with C.

PleegWat

@cvi said in An amusing rant about C:

Android? Still plenty of 32-bit devices out there, but 64-bit devices aren't uncommon either.

I know all the instruction transpiling in x86 schedulers is pretty good at handling unaligned loads/stores, and the 64-bit architecture retains 32-bit load/store instructions.
I know ARM just rejects unaligned load and stores outright. Does 64-bit ARM have 32-bit load/store instructions, or does this have to be coded out in a longer sequence every time?

Gąska

@topspin said in An amusing rant about C:

@Gąska said in An amusing rant about C:

@cvi not to mention -mingw32 and -mingw64 variants, and whatever clang spits out. Also, not what I was talking about at all. I meant that they kept excellent job of making sure that over 30 years and thousands of updates, all WinAPI, COM, and other interfaces across all Windows releases have all their millions of functions stay with the same ABI, so that Windows 95 apps can still be run on Windows 10 machines, even ones that dig deep into Windows internals.

Yes, but they do all of this with C.

People used to make dynamic websites in C. Doesn't make C any good for web development. With enough effort, you can do anything in anything. I've seen object-oriented programming with inheritance in BAT scripts. Slow as hell but it worked.

cvi

@PleegWat I think it has load/store instructions for various sizes (8, 16, 32 and 64; and larger ones for the vector stuff). Not 100% sure about the alignment requirements, but I'd strongly suspect they are tied to the size of what you're reading/writing.