NULL: the worst mistake of computer science

cartman82

Paul Draper / Sep 24, 2018 / Architecture

The worst mistake of computer science - Lucidchart

In 1965, Sir Tony Hoare introduced the null reference. Fifty years later, we look at its drawbacks, its presence, and the better way.

####What is wrong with NULL?

The short answer: NULL is a value that is not a value. And that’s a problem.

It has festered in the most popular languages of all time and is now known by many names: NULL, nil, null, None, Nothing, Nil, nullptr. Each language has its own nuances.

Some of the problems caused by NULL apply only to a particular language, while others are universal; a few are simply different facets of a single issue.

NULL…
subverts types
is sloppy
is a special case
makes poor APIs
exacerbates poor language decisions
is difficult to debug
is non-composable

He then goes on to elaborate on each point. All good arguments on why null creates problems.

He then goes on to propose the solution:

NULL has become so pervasive that many just assume that it’s necessary. We’ve had it for so long in so many low- and high-level languages, it seems essential, like integer arithmetic or I/O.

Not so! You can have an entire programming language without NULL. The problem with NULL is that it is a non-value value, a sentinel, a special case that was lumped in with everything else.

Instead, we need an entity that contains information about (1) whether it contains a value and (2) the contained value, if it exists. And it should be able to “contain” any type. This is the idea of Haskell’s Maybe, Java’s Optional, Swift’s Optional, etc.

Basically, his nirvana is something like C#'s Nullable<T> container.

Meh. Too complicated for my taste. No one's gonna use that in its current verbose form.

I guess unless someone comes up with a better Nullable syntax, null is here to stay for the time being, despite all its faults.

dkf

@cartman82 said:

Meh. Too verbose for my taste. No one's gonna use that in its current verbose form.

I quite like annotating a variable as being known nullable or guaranteed non-null, and then having the compiler enforce that. I've found a number of evil bugs in my code from doing this…

ijij

@dkf said:

I quite like annotating a variable as being known nullable or guaranteed non-null, and then having the compiler enforce that. I've found a number of evil bugs in my code from doing this…

That would be sweet.

R - my current learning target - does no such thing.

NULL is at best a simple placeholder for many subtler ideas....

I think NULL will only go away when missing data goes away.

NedFodder

@dkf said:

I quite like annotating a variable as being known nullable or guaranteed non-null,

C++ is actually quite nice for doing this. All my function parameters are pointers if they are known to be nullable (optional parameters, basically), or references to guarantee they are non-null. I haven't forgotten to check for null in a long time.

rc4

I'm not much of a dev, so correct me if I'm wrong:

Is he saying "the problem isn't developers writing code incorrectly, it's the language itself!!1"?

LB_

@cartman82 said:

Instead, we need an entity that contains information about (1) whether it contains a value and (2) the contained value, if it exists. And it should be able to “contain” any type.

http://en.cppreference.com/w/cpp/experimental/optional

Boost.Optional - 1.59.0

sloosecannon

@dkf said:

@cartman82 said:
Meh. Too verbose for my taste. No one's gonna use that in its current verbose form.

I quite like annotating a variable as being known nullable or guaranteed non-null, and then having the compiler enforce that. I've found a number of evil bugs in my code from doing this…

Yeah, that's really nice. With IntelliJ it will tell you when you might be passing in a null to a @NotNull method too.

blakeyrat

... what does he propose relational databases use instead?

EDIT: he doesn't even ONCE mention SQL, and there's just a comment in the comments section saying, "hey, yo, buddy, this is a really stupid topic to discuss without once mentioning SQL."

cartman82

@blakeyrat said:

... what does he propose relational databases use instead?

I remember someone on WTDWTF advocating getting rid off NULLs from databases as well. We had a big argument and everything.

blakeyrat

I remember several conversations about the demented way Oracle handles null strings, but I don't recall any discussion about getting rid of NULL in databases altogether.

I mean. It could be done, but then what do you use to, for example, represent that a record from only one table is present when a LEFT JOIN is used? If you get rid of NULL you have to either have some incredibly clever solution for this, or you need to get rid of the concept outer joins altogether, which is really throwing a lot of really good stuff right in the toilet.

PWolff

ChristianNeumanns has a point:

ChristianNeumanns / Jun 18, 2014

Why We Should Love 'null'

This article looks at popular techniques used to avoid null pointer errors and answers the question: Which is the best solution you should use in your code?

loose

NULL is a value that is not a value. And that’s a problem.

Got as far as that and stopped reading /taking it seriously. Why?

Zero is a number that is not a number. And that's a problem

Filed under:

cartman82

@blakeyrat said:

I mean. It could be done, but then what do you use to, for example, represent that a record from only one table is present when a LEFT JOIN is used? If you get rid of NULL you have to either have some incredibly clever solution for this, or you need to get rid of the concept outer joins altogether, which is really throwing a lot of really good stuff right in the toilet.

Yeah that was it.

The problem is that, as you do outer join, the NULLS you get are ambiguous. You have no idea whether the NULL is from the tuple in the joined table, or from there not being a tuple to join with.

And the solution is, of course, to go the javascipt route and have two NULL-like types, instead of zero.

blakeyrat

@cartman82 said:

The problem is that, as you do outer join, the NULLS you get are ambiguous. You have no idea whether the NULL is from the touple in the joined table, or from there not being a touple to join with.

I need you to elaborate on this, and also spell "tuple" correctly.

Do you mean a scenario when you have:

FOO LEFT JOIN BAR LEFT JOIN BAZ

Where BAR joins to a value in FOO, and BAZ joins to a value in BAR? Therefore if BAZ is null, you aren't certain whether it's because the FOO has no BAR, or there's no BAR to join a BAZ to?

If my guess is correct, follow-up question: is there any circumstance in which that matters?

Still, the real point is: the fact that SQL is missing from that article makes it amazingly fucking useless. And also makes me wonder what this guy codes-- video games? Even video games, except cheap indies, usually have databases, though...

loose

@ijij said:

I think NULL will only go away when missing data ~~goes away~~returns.

FTFY

cartman82

@blakeyrat said:

Do you mean a scenario when you have:

FOO LEFT JOIN BAR LEFT JOIN BAZ

Where BAR joins to a value in FOO, and BAZ joins to a value in BAR? Therefore if BAZ is null, you aren't certain whether it's because the FOO has no BAR, or there's no BAR to join a BAZ to?

If my guess is correct, follow-up question: is there any circumstance in which that matters?

SELECT * FROM foo

  id | bar_id
  ------------
   1 | 'A'
   2 | 'B'
   3 | NULL

SELECT * FROM bar

    id | value
  ----------------------
   'A' | 'Value exists'
   'B' | NULL

SELECT foo.id, bar.value
FROM foo
LEFT JOIN bar ON foo.bar_id = bar.id

  foo.id | bar.value
  --------------------
       1 | 'Value exists'
       2 | NULL
       3 | NULL

In the final result set, there are two NULL-s. One is from the value field of the joined tuple from bar. The other is from there not being a tuple to JOIN with id 3. You can't tell them apart. Because NULL is used for both occasions.

It's a similar problem to the "4. NULL makes poor APIs" section from the OP. The solution is, IMO, to have two types of NULL instead of having some kind of special verbose interface like Nullable<T> (which wouldn't work in SQL anyway).

blakeyrat

@cartman82 said:

In the final result set, there are two NULL-s. One is from the value field of the joined tuple from bar. The other is from there not being a tuple to JOIN with id 3. You can't tell them apart. Because NULL is used for both occasions.

Ok, but who cares? How would knowing why it's NULL help you in any way?

cartman82

@blakeyrat said:

Ok, but who cares? How would knowing why it's NULL help you in any way?

Err... yeah? You lose information. That's a pretty big deal when working with an information storage system.

Sure you can work around it (eg. do two queries or replace one type of NULL with a placeholder value), but you shouldn't have to.

blakeyrat

@cartman82 said:

Err... yeah? You lose information.

But if the information you "lose" useful in any way? You keep dodging the question; either answer it or send me all the moneys.

@cartman82 said:

Sure you can work around it (eg. do two queries or replace one type of NULL with a placeholder value), but you shouldn't have to.

Why would you ever want to?

svieira

The solution there is actually just to add a NOT NULL constraint to bar.value - either it exists, or it doesn't. If you want to track that it existed at some point, record the changes into another table (or use a data storage solution designed to handle such things).

Buddy

Select foo.id, bar.id, bar.value?

cartman82

@blakeyrat said:

But if the information you "lose" useful in any way? You keep dodging the question; either answer it or send me all the moneys.

Sigh. Fuck it. Here are your monkeys. Hope you're happy.

blakeyrat

Blakeyrat wins again.

cartman82

@blakeyrat said:

Blakeyrat wins again.

Brick wall wins against head. Someone alert the media.

PWolff

Do you mean with this, if the second monkey doesn't shit, whether that's due to the first or the second monkey's constipation?

Buddy

On the original topic:

a) C# nullables already do have pretty good sugar, don't they? Something like T? as shorthand for Nullable<T>? Between that and Elvis operator, C# option syntax is already pretty good. The problem C# has is they can't remove nullable-by-default from reference types without breaking everything.
b) The problem isn't null itself, it's that every single variable could be null. Do you honestly ~~want to~~ check every argument to every function you ever write?
c) There is a better syntax already, but it's not in widespread use yet. Enumerated types ala Rust (and some other languages that are hardly worth mentioning 🍨) are the way of the future.

Filed under: pattern-matching syntax is kind of headache-inducing though

dkf

@cartman82 said:

You lose information.

If you wanted that information kept, you wouldn't have asked for that type of join.

Dreikin

[quote=" Paul Draper"]

What is wrong with NULL?

The short answer: NULL is a value that is not a value. And that’s a problem.[/quote]

This sounds like a problem for...
###NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
#Batman!

Rhywden

@dkf said:

@cartman82 said:
You lose information.

If you wanted that information kept, you wouldn't have asked for that type of join.

Yeah, it's a bit like: "Oh, converting from float to int will lose information!"

In this case one didn't need the information to begin with or you'd have made the column NOT NULL.

antiquarian

@cartman82 said:

The problem is that, as you do outer join, the NULLS you get are ambiguous. You have no idea whether the NULL is from the tuple in the joined table, or from there not being a tuple to join with.

And the solution is, of course, to go the javascipt route and have two NULL-like types, instead of zero.

Or you could just check the primary key of the joined table, which is not allowed to be NULL.

PWolff

Or, we could simply give the null object a counter that indicates how many times it has been passed on.

Filed under: The Evil Ideas Thread is

ashkante

@PWolff said:

ChristianNeumanns has a point:

ChristianNeumanns / Jun 18, 2014

Why We Should Love 'null'

This article looks at popular techniques used to avoid null pointer errors and answers the question: Which is the best solution you should use in your code?

I read both articles today and I find myself siding with Neumanns (on most points).

Take null away from a language and replace it with something like Java's Optional<T>, and not much changes:

Still need to consciously use it.
Still need to check if it has a value.
Still throws an exception if misused.

I'll admit, it is a better way to show that your function returns (or accepts) 'possibly nothing', than comments.
Or am I missing a glaringly obvious use case for Optional where it really shines?

Maciejasjmj

The problem with not having nulls is that most programmers also want a way to declare a variable without initializing it.

You can force people to write a default initializer, but in cases you want a proper object and not an empty one, you'd still need to write checks - so congratulations, you just reinvented nulls, except ten times as fragile, since it doesn't throw right away.

You can force a function declaration not to accept nulls, but the compiler has no way to verify with 100 percent certainity that a variable has been initialized. So that's kinda out too.

Finally, you can force people to initialize on declaration, and either condemn them to ternary hell, or have them construct empty throwaway objects. Or rather, whole object trees, since the internal fields can't be empty either.

So yeah, nulls might not be too great, but I think it's the best we can do.

antiquarian

@ashkante said:

Take null away from a language and replace it with something like Java's Optional<T>, and not much changes:

Still need to consciously use it.
Still need to check if it has a value.
Still throws an exception if misused.

So having the error show up at compile time rather than run time has no value?

Captain

Never mind the compile time error -- you can reason about "optional" values by looking at the source code and realizing that a value can be 'nothing' instead of having to read documentation to realize that it could be 'null'.

This makes your life easier.

data Maybe a = Just a | Nothing

thing :: Maybe Object
thing = ...

queryWrittenAgainstThing :: Maybe Object -> Bool
queryWrittenAgainstThing (Just thing) = ...
queryWrittenAgainstThing Nothing = ...

In other words, 'null' is an opaque error you have to actively guard against, in nearly all circumstances, while 'nothing' is just a possibility when you have a 'Maybe' type.

ScholRLEA

It was my understanding that the biggest advantage of Haskell's Maybe monad is that it forces you to address to case of missing data - anything that has a possible Nothing state must have a case analysis that handles that case. Not having used Haskell much, I am uncertain if I have that right, though.

Captain

Almost. It throws a warning if you miss a case. You can force the compiler to throw errors for warnings, but it's not fine-grained enough to just throw errors for missing cases.

That said, the bare fact that the maybe-ness is right there in the source code increases your productivity, since you know that everything that isn't a maybe is never null.

dkf

@Maciejasjmj said:

the compiler has no way to verify with 100 percent certainity that a variable has been initialized

That's an entirely wrong assertion when it comes to local variables. You can also do it for globals/instance-/class-variables, though I'm not quite as familiar with the analysis required.

Maciejasjmj

int x;
if (very_difficult_problem(constant_input)) 
{
    x = 42;
}

I dunno. Maybe it would work, but it would either require some insanely powerful static analysis tools, or the compiler to chew you off on missing cases that are impossible anyway.

dkf

That's an easy case, actually. It would chew you off for not defining it on all code paths. It doesn't try to second guess whether a boolean test passes (except for the most utterly obvious cases) but just assumes that it's sometimes true and sometimes false. Static analysis doesn't view things in quite the same way as programmers normally do. :)

blakeyrat

True; but I can certainly debate that C# (for example) shouldn't make object types nullable by default. (Which, incidentally, is being rectified in the next major .Net update, from what I understand.)

xaade

Why is county null in this join?

Select p.addressId, CASE WHEN a.addressId IS NULL THEN False ELSE True END as DoesAddressRecordExist, a.county from person p join address a where p.addressId = a.addressId

Bam, now you know the difference.

Ragnax

@Maciejasjmj said:

You can force a function declaration not to accept nulls, but the compiler has no way to verify with 100 percent certainity that a variable has been initialized. So that's kinda out too.

It doesn't need to. Like dkf mentioned:

@dkf said:

That's an easy case, actually. It would chew you off for not defining it on all code paths.

Generally speaking it works that way, because static analysis tools mostly operate on the level of preconditions and postconditions. They're not interested in what happens inside a method, as much as they are interested in what a method describes must go into it and what a method describes as guaranteed to come out of it.

Good static analyzers can infer missing conditions based on pre- and postconditions from other methods and can handle simple branching logic as well, giving you the illusion that they're quite a bit more intelligent than they reeally are (.NET Data contracts are quite amazing at this, actually, especially for a free solution) but in the end it still boils down to 'all logical branches must fullfill all guarantees needed by the (inferred or not) post-conditions.

PleegWat

That's not where it's annoying. Where it's annoying is:

bool x = some_external_value;
int y;

if( x )
{
    y = 42;
}

/* some code which does not touch x or y */

if( x )
{
    printf( "%d\n", y );
}

dkf

@xaade said:

CASE WHEN a.address IS NULL THEN False ELSE True END

Some kind of allergy to IS NOT NULL here?

(And you say nothing explicitly about county so I've no idea even which table it's coming from. I'm not going to play random guess and debug the schema for you.)

xaade

If county is NULL, you check the other flag in the query to see if the address record exists.

Point being, there are ways to setup a join to where you can tell if the values are null because the outerjoin produced null values, vs. the record existing and having null values.

loopback0

Does that matter? You aren't getting the value either way.

xaade

I was entertaining cartman's concern that you can't tell the difference in an outer join.

But, you can, if you setup the query to tell you that.

I'm not taking it any more seriously than.

"I wonder if you can do that."

loopback0

Of course you can, you just wouldn't.

dkf

@xaade said:

If county is NULL, you check the other flag in the query to see if the address record exists.

Why does it matter? With that join, you are asking for a result set where each result is a tuple that might (or might not) have information about the county associated with a person's address. For the uses where a join makes sense, worrying about why the county is NULL indicates a dedication to Doing Things The Discourse Way.

SQL gives you what you actually ask for, not what you think you asked for. Pesky computers, doing what they're told!