Zed Shaw gets schooled on C undefined behavior

flabdablet

Zed accepts:

Alright my friend, here's the gist:

...

Interesting guy. Obviously talented in many ways. But at the same time, seems to be his own greatest enemy.

Indeed.

Here's the safercopy() function copied from that github link:

void safercopy(size_t to_length, char to[], size_t from_length, char from[])
{
    int i = 0;

    // if you're butthurt I put this if-statement here you can remove it to show me how to 
    // break the for-loop and make it run forever
    if(to != NULL && from != NULL && (int)to_length > 0 && (int)from_length > 0) {
        for(i = 0; i < to_length && i < from_length && from[i] != '\0'; i++) {
            to[i] = from[i];
        }
    } else {
        // normally you'd then have an error here, but I'm keeping the function call
        // the same as in the book for the challenge
    }
}

If that had come to me in a code review, I would be raising the following points:

Given that size_t is an unsigned type, casting to_length and from_length to int before testing the sign of the result is broken if sizeof (int) <= sizeof (size_t), or pointless otherwise.
Likewise, comparing i (which is int) against to_length and from_length (which are size_t) is unsafe. You don't need to do clever memory-overwriting tricks to make that loop run forever, just pass to_length and from_length values > INT_MAX. Fixing this is as simple as declaring i to be size_t instead of int.
The loop stops before copying the null terminator, so if from[] terminates early then to[] will end up with garbage from that point onward.

This coder is not qualified to critique K&R.

PleegWat

I'd also tell him off for having the NULL checks on to and from. Passing NULLs in for those arguments (or at least doing so without also passing 0 for the corresponding length value) would be a breach of the function contract which in most cases you shouldn't check - letting it segfault will result in an easier debugging experience.

flabdablet

@PleegWat said:

breach of the function contract

To be fair, the only way to be explicit about function contracts in C is to document them in comments. It may well be the case that safercopy() is supposed to do nothing if handed a null pointer.

PleegWat

That or GCC's nonnull attribute, but that's nonstandard. And yes, there may be cases where ignoring incoming NULLs is desirable, but here you're passing buffer lengths as well. If the buffer pointer is NULL, then the only appropriate value for the buffer length is 0, and the for loop is a noop if one of the lengths is 0.

flabdablet

Fuck you. I'm Zed Shaw. I'll check whatever I fucking well feel like checking and you can just shut the fuck up. How about that for a contract?

PleegWat

Long as you keep your dirty paws off my codebase.

flabdablet

I actually think the whole premise of his project ("fixing" K&R) is broken. K&R-style C is fine for what it is: a more portable alternative to assembly language for code that has to run close to the bare metal, like OS kernels or application programs with resource constraints similar to those you'd find on a 1970s-era PDP-11 (such as many modern microcontrollers).

The fault with C, such as it is, lies not with the language but with those who think of it as a reasonable alternative to less bare-metal, more abstract languages for general line-of-business applications. It's not that, and never will be that. C is a good choice for niches where the cost of hardware and execution time exceeds the cost of skilled and careful programmers.

Cutting and pasting web-derived snippets of C without understanding exactly why they do what they do is just going to cause trouble. Reacting to that simple fact by trying to plaster published C snippets with safety features is pointless. ZS should stick to promoting Python, which is way less likely to cause grief for the C&P brigade than C will ever be.

Kian

The big issue, that people tried to explain to him pretty reasonably and he accused them of ganging up on him and ragequit C, is that he mistakes safety with some kind of static analysis. His big objection is that looking at the function, you can't guarantee it doesn't loop forever. He doesn't care about what would actually happen in a real computer executing the code, which is the whole point of C in the first place. He cares about the "purity" of the code.

In his view, then, his version is better because supposedly, the explicit checks for length "ensure" the function will end.

What he refuses to acknowledge is that even if he was right, his function is no safer than K&R's. Who cares that the function will stop looping if you've already overwritten memory you shouldn't have? You've already fucked up your program, and either segfaulted or handed control of your machine to malicious code. Looping forever is not the worst thing ever that can happen to a program, but he's focused on that one thing as a measure of quality.

Also, he continuously says that you can't check that a C-string is valid, because the only way to do it is to find the null-terminator, and if there isn't one you "loop forever". But then, it's also impossible to check that a pointer and length are a valid buffer.

flabdablet

@Kian said:

His big objection is that looking at the function, you can't guarantee it doesn't loop forever.

So he cracks the sads because his C compiler can't solve the Halting Problem?

kyrias

@Kian said:

Unicode doesn't allow embedded nulls in text

Ignoring everything else, this is completely wrong, in multiple ways. But for starters, embedded NULLs are just fine in Unicode/UTF-8 texts, and anything that does not support them is not fully Unicode/UTF-8 compliant.

Examples being Java and TCL which uses an overlong sequence instead of NULL bytes, which any conformant UTF-8 decoder would reject.

HardwareGeek

@flabdablet said:

ZS should stick to promoting Python, which is way less likely to cause grief for the C&P brigade than C will ever be.

For some reason I didn't see, or at least get around to reading, this topic before it was necroed just now.

Learn Python the Hard Way (LPTHW) is my only exposure, prior to this thread, to ZS. While I think it's a reasonably good intro to Python, I have two problems with LPTHW. First, and not really a fault of the book or ZS per se, it's written for somebody who's never programmed anything before, not for an experienced programmer trying to learn a new language.

More damning, IMHO, is that it seems specifically focused, intentionally or otherwise, on enabling the C&P brigade. It's one thing to write, "Don't worry if you don't understand this right now; we'll do it enough that you'll get it eventually," which he does, but it's something else to write, "Don't ask why it didn't work. If it didn't work, it's because you typed it wrong; just make your program look exactly like what I wrote." This is how you train code monkeys, not teach programmers. At one point he writes, "I can't explain why this¹ doesn't work; it just doesn't." I'm just learning Python, and I can explain it in one sentence containing words of no more than two syllables that (I think) even a non-programmer can understand; if ZS can't, he shouldn't be writing about Python.

¹~~For some value of this that I don't remember off-hand and CBA to search for at the moment.~~I remember; it was two statements on one line of source.

Weng

My meta-system is designed to intake ill-specified arbitrary data files. Our customers COULD tell us what the encoding actually is, but not one of them has ever actually managed to answer the fucking question.

Our default is 'It's fucking ASCII until someone says otherwise/complains about garbage characters'.

I wrote ASCII specific string handling code TODAY.

Scarlet_Manuka

ASCII is the C of character encodings: decades old, primitive, with much more expressive and capable replacements readily available – but it will never, ever go away.

inb4 several posts doing a FTFY on "C"

Kian

@kyrias said:

Ignoring everything else, this is completely wrong, in multiple ways.

Yes, ignoring the context you can completely miss the point and qualify for a pendant badger. Although I detract points because even with just your quote I am still technically correct.

So let me qualify that sentence. In UTF-8, which is indeed the encoding we were talking about, the only way in which you will find a byte with every bit set to 0 is in the NUL character. Every other code point, no matter how many bytes long it may be, never uses that specific sequence of bits in a byte. So your clarification that null bytes can be embedded in a UTF-8 string would have been correct had that been what you said.

Note however that not every valid UTF-8 string is text. Text is generally understood to mean the actual visible characters. A sequence of invisible control characters that don't show up in the screen is not text. If I make the pc speaker beep, I doubt anyone would say that's text.

A text string, one that only has text in it and not invisible control characters, will not have any embedded nulls. So even my out of context quote is still correct.

dkf

@kyrias said:

Examples being Java and TCL which uses an overlong sequence instead of NULL bytes, which any conformant UTF-8 decoder would reject.

That's only the internal encoding, you mouth-breathing drongo. The outside world doesn't see any of those; a “problem” that you cannot observe any consequences from is not a problem. (And it's a NUL and not a NULL.)

The advantage of this? You can squish the strings through a classic C API without having the world fall in on you.

gleemonk

[quote="flabdablet, post:98, topic:49535]
He's a ranter.

He's not my favorite ranter.
[/quote]

Interesting, haven't watched either of these talking heads before.

The difference I see here is that the one ranter explains to us how a fictional piece should not be mistaken for reality while the other one drudges on about how reality sucks. And the advice? Well in the one case you just don't watch the movie! (And learn to compare stories against reality.) In the other case? Why of course! Reject reality! (Oh and demand other people follow your simple prescriptions.)

cartman82

You thought this was over?

http://zedshaw.com/2015/09/28/taking-down-tim-hentenaar/

Nope.

Over this next week I’m going to systematically take down more of my detractors as I’ve collected a large amount of information on them, their actual skill levels, and how they treat beginners. Stay tuned for more.

Oh boy, and there's more to come. This guy is like the Uwe Boll of programming.

Come on Zed Shaw, it's been weeks! Bring out the next offender! You can't let them just slander you and get away with it!

Maciejasjmj

Oh man, that's a trainwreck. They both reinvented the wheel, they both reinvented the wheel badly, and are now arguing whether one guy's square wheel is better than the other's triangle wheel.

dkf

@cartman82 said:

You thought this was over?

Well, I had sort of hoped…

What Zed needs to be ultra-clear about is whether he's dealing with the size of the buffer or the length of the string it contains. These are different things in C. Good string handling libraries track them as separate things as that allows quite a lot faster code. I include in the definition of “good string handling libraries” here the implementation of C++'s std::string, Java's StringBuilder, C#'s StringBuilder (these two have plain string classes are immutable, making the difference between buffer size, err… capacity and length largely moot; their builder classes are what have meaningful modification semantics), and the string code in just about every good scripting language I can think of.

Some people say that C's string code sucks. It does. But it is really just the tools to make something that doesn't suck.

cvi

I would argue for the triangle wheel. At least it's always planar.

Rhywden

@cvi said:

I would argue for the triangle wheel. At least it's always planar.

Um.

cvi

Still planar in a spherical / hyperbolic coordinate system.

Rhywden

I think "planar" and "spherical" are not words that go together well :)

flabdablet

@Maciejasjmj said:

arguing whether one guy's square wheel is better than the other's triangle wheel

Triangle wheel is clearly better. It eliminates one bump.

cvi

Locally planar everywhere then? (And please, no jokes about missing the big picture.)

Rhywden

Might as well.

dkf

@cvi said:

Locally planar everywhere then?

One test for what the curvature of space is is to draw a true triangle and see what the angles add up to. If they're 180° then you're in flat space. More than that and space is spherical; less and it's hyperbolic. If your instruments are accurate enough, you don't need a large triangle at all…