Super-secret question

GettinSadda

It seems that to checkmy credit report I need to answer a question that even I may not be allowed to know!

[img]http://img177.imageshack.us/img177/8277/whatisyournullhs0.png[/img]

Fortunately leaving the answer blank was accepted

DogmaBites

I'm surprised it even said "What is". Normally that part changes with the specific question.

Vechni

protip: null = null

ammoQ

@Vechni said:

protip: null = null

Not in SQL.

Welbog

@ammoQ said:

@Vechni said:
protip: null = null
Not in SQL.

Yeah, but why the fuck not? I hate languages in which you need special checks for null since null != null. I hate it when I have a test like foo = bar and another test like foo != bar and encounter conditions when they're both false. Fucking shit, foo = bar OR foo != bar should be a tautology, damn it!

Digitalbath

@ammoQ said:

@Vechni said:
protip: null = null

Not in SQL.

SET ANSI_NULLS OFF FTW!

morbiuswilters

@Welbog said:

Yeah, but why the fuck not? I hate languages in which you need special checks for null since null != null. I hate it when I have a test like foo = bar and another test like foo != bar and encounter conditions when they're both false. Fucking shit, foo = bar OR foo != bar should be a tautology, damn it!

Agreed. It's even more annoying when you have to do "IS NULL" or "IS NOT NULL" all over the place.

@Welbog said:

Yeah, but why the fuck not? I hate languages in which you need special checks for null since null != null. I hate it when I have a test like foo = bar and another test like foo != bar and encounter conditions when they're both false. Fucking shit, foo = bar OR foo != bar should be a tautology, damn it!

Ostensibly it's because null is not a value, it's the absence of a value and therefore does not have any known datatype. Thus, comparing null to anything is always undefined, even when comparing to null. If you have an Int variable that's NULL, and a VarChar(50) variable that's also null, should they test positive for equality?

Personally, I think the developers were just lazy and decided they could get away with writing a blanket "return null" instead of actually finding a solution to the datatype issue.

danixdefcon5

@Welbog said:

Fucking shit, foo = bar OR foo != bar should be a tautology, damn it!

Um... nope, foo = bar XOR foo != bar should be. Otherwise you might get foo = bar -> true and foo != bar -> true.

PileOfMush

@GettinSadda said:

Notice: Phorm, and it's agents including ISPs collecting data on Phorm's behalf, are specifically forbidden from performing any processing or monitoring of the content of the above post. Hence, under the Regulation of Investigatory Powers Act 2000 any such attempt to profile this page by Phorm or it's agents is illegal.

Phight the power.

Oliver_Klozoff

@Welbog said:

@ammoQ said:
@Vechni said:
protip: null = null
Not in SQL.
Yeah, but why the fuck not? I hate languages in which you need special checks for null since null != null. I hate it when I have a test like foo = bar and another test like foo != bar and encounter conditions when they're both false. Fucking shit, foo = bar OR foo != bar should be a tautology, damn it!

A rather intelligent and skilled DBA once explained to me why NULL == NULL isn't true in ANSI SQL. The confusion arises from what people think NULL means.

The vast majority of computer programming languages, and by extension the vast majority of computer programmers, say: null == no value.

SQL, however, says: null == unknown value.

Once you grasp that concept, things start making a lot more sense.

NULL == NULL => "Some unknown value is equal to some other unknown value" => NULL (it is unknown whether or not this is true).

A + NULL => "Some number plus an unknown value" => NULL ("unknown value")

A=A AND NULL => "A true thing and an unknown thing are both true" => NULL ("unknown")

A=A OR NULL => "A true thing is true OR an unknown thing is true" => TRUE

havokk

@Welbog said:

foo = bar OR foo != bar should be a tautology, damn it!

Not in three-predicate logic.

Where most people fail to understand Nulls is their use of misleading language. The term "null value" is bad because there is no such thing as a "null value". The better term is "null marker".

If a column in a database contains a null marker then that means that the database does not know what the value of that attribute is at this point in time. The attribute does have a value. For example, consider a table storing details about people. One of the rows has a null marker in the height column. Does that mean the person in question does not have a height? No, it doesn't. That person does have a height, its just that the database doesn't know it right now.

Imagine pushing your hand into a jar of lollies and grabbing a handful. You have no idea how many lollies are in your clenched fist (i.e. a null marker). If you compare your clenched fist with an open hand full of lollies (so you know how many there are) and say "are the number of lollies in these two hands the same?" then the answer has to be null. ( A = Null --> Null ). If you ask "is the number of lollies in this hand plus 5 equal to 10?" then the answer has to be null. ( Null+5 > 10 --> Null ). If you grab an unknown number of lollies in both hands and ask "are the number of lollies in these two hands the same?" then the answer has to be null. ( Null = Null --> Null ).

In my opinion, the real WTF is the use of Null in programming languages. How would the value of a variable be unknown to the program? I can understand "Uninitialised" but I don't understand how a variable can be unknown. Is the CPU unable to follow a pointer and interrogate the bit of memory it is pointing to?

Null markers are required in databases (thanks to Codd's rules and the realities of storing information on real-world entities) but not in programming languages.

B

morbiuswilters

@havokk said:

Not in three-predicate logic.

Circular argument -- obviously Welpog is criticizing the use of 3 value logic for relational databases.

@havokk said:

If a column in a database contains a null marker then that means that the database does not know what the value of that attribute is at this point in time. The attribute does have a value.

No it doesn't. That is one particular meaning of NULL but it can also indicate that a value is not applicable. In fact, there are multiple actual uses of NULL which means that its definition is not internally consistent. Meanwhile, in the real world this adds a bunch of complexity to relational databases that is of very little use to business applications. If you actually have a need for these various types of NULL you can just implement magic values to represent unknown, unknowable or inapplicable values. This would not burden the majority of developers with dealing with NULL while still providing the possibility of rolling your own multi-value logic system. Additionally, the magic values would actually be indexed whereas NULLs frequently are not.

@havokk said:

In my opinion, the real WTF is the use of Null in programming languages. How would the value of a variable be unknown to the program? I can understand "Uninitialised" but I don't understand how a variable can be unknown. Is the CPU unable to follow a pointer and interrogate the bit of memory it is pointing to?

This is completely silly. For one, you're ignoring uses of NULL other than unknown. Essentially, though, nulls in programming languages can be treated the same as NULLs in SQL. You seem to be confusing the meaning of null within the abstract language and the actual implementation in hardware.

@havokk said:

Null markers are required in databases (thanks to Codd's rules and the realities of storing information on real-world entities) but not in programming languages.

Nonsense. NULLs can be represented any number of ways. The thing is, there is very little reason to make three value logic the default for databases. Multi-value logic can be implemented on top of a standard boolean system. Yes, it's more work than having it enshrined in the language standard, but it's more work for people who are using three state logic who are in the minority. Most of the time two state logic is all that is needed and forcing developers to mess with all three value logic just to write business apps is absurd.

ammoQ

@morbiuswilters said:

Essentially, though, nulls in programming languages can be treated the same as NULLs in SQL.

Not really. For example, in java, (variables pointing to) objects are compared like that: o1.equals(o2). If o1 happens to be null, this crashes, so you need to check for null, too.

havokk

@morbiuswilters said:

That is one particular meaning of NULL but it can also indicate that a value is not applicable.

Good point. Codd requires that nulls be supported for "missing information and inapplicable information". I argue that "inapplicable" means you haven't normalised your tables enough.

A null marker in a column could indicate many things. For example, a null marker in a telephone column in a person table could mean (1) this person has no phone, or (2) I don't know if this person has a phone or not, or (3) I know this person does have a phone but I don't know what the number is, or (4) I don't care if this person has a phone. If an app needs to distinguish between these meanings then the database design should accomodate it.

I'm not sure I agree or disagree with "of very little use to business applications". I believe that supporting unknown and inapplicable is simply "required for business applications in the real world" and we have to deal with it. We are not always going to know every attribute of the entities we are modelling.

Magic numbers are not necessarily the way to deal with unknown data. Imagine using $-1.00 to indicate "unknown price". What implication does this now have on averages and totals? What implication does this have on comparing prices? What implication does this have on tax calculations?

Regardless of how they are dealt with, we have to write more code to deal with these exceptional cases.

@morbiuswilters said:

Essentially, though, nulls in programming languages can be treated the same as NULLs in SQL. You seem to be confusing the meaning of null within the abstract language and the actual implementation in hardware.

Not at all. A null pointer, a pointer that is not currently pointing to anything, is not "unknown or inapplicable". It is a known state. Maybe we should be calling it something other than "null" to avoid the confusion? An unitialised variable, one that has been created but not assigned any value, is not "unknown or inapplicable", it is logically unavailable. We should not be using the term "null" for this, we should be using the correct term "uninitialised".

Three predicate logic (true, false, null) is not really anything specific to logic - nulls apply to all data types so they apply to boolean data types.

B

Welbog

@ammoQ said:

o1.equals(o2). If o1 happens to be null, this crashes, so you need to check for null, too.

But in this case you compare o1 to null using the standard equality operator ==. (o1 == null) not (o1 is null) or something. Moreover, o1 == null <=> ! (o1 != null) is a tautology, which is the basis of my complaint. Your comparison operations are reversible in Java, but not in SQL.

Based on what havokk said, I'd say he's someone who has trouble differentiating between meaning and rules. Logic has certain natural and intuitive features that are very important to preserve. One of those is operation reversibility, i.e. A op B is equivalent to not (A notop B) where op is the operation in question and notop is its opposite. This equivalence doesn't hold in SQL when nulls are involved and that's really fucking confusing the first few times you have to deal with it because it doesn't make any goddamned sense. Even after you deal with it, you still have to write special cases in situations you'd expect to just work.

Like say you want to get all rows whose "city" field isn't "Denver". SELECT * FROM Doug WHERE City <> 'Denver' doesn't return any rows in which City is null. So you have to add a OR City IS NULL part to it. And then if you want to parameterize the thing to accept any criterion for the City field in place of Denver, you basically have to write a whole different clause in the case that the parameter you passed is null, since WHERE City <> NULL returns false for every field, and the OR City IS NULL part makes the thing return every null-City row, which is the opposite of what you want!

I don't know what kind of intuition drives this kind of insanity. I don't think I want to know.

morbiuswilters

@Welbog said:

Your comparison operations are reversible in Java, but not in SQL.

Precisely. This is a limitation of any multi-value logic system. That doesn't mean such systems aren't useful but to have it be the default logic model of SQL is pretty absurd.

@Welbog said:

I don't know what kind of intuition drives this kind of insanity. I don't think I want to know.

It's not insanity or a matter of intuition. It's the fact that one special case for values is enshrined in the standard and is required for interaction with the data. NULL isn't special in and of itself, it just represents an unknown, unknowable or inapplicable quantity. Why not have other special values like FILE_NOT_FOUND? The problem is that it forces the methods of edge-cases into the main program flow which is more complex. Complexity is the enemy of efficiency, verifiability, validity and maintainability.

morbiuswilters

@havokk said:

I argue that "inapplicable" means you haven't normalised your tables enough.

And any software engineer will tell you that normalization is sometimes the enemy of simplicity and performance. The relational model is all well and good in theory but in practical terms it is sometimes limited.

@havokk said:

If an app needs to distinguish between these meanings then the database design should accomodate it.

Yes, the database design, not the query language.

@havokk said:

I'm not sure I agree or disagree with "of very little use to business applications". I believe that supporting unknown and inapplicable is simply "required for business applications in the real world" and we have to deal with it. We are not always going to know every attribute of the entities we are modelling.

You're misunderstanding. Unknown and inapplicable are certainly useful, but enshrining one particular case in the language is what I am arguing against.

@havokk said:

Magic numbers are not necessarily the way to deal with unknown data. Imagine using $-1.00 to indicate "unknown price". What implication does this now have on averages and totals? What implication does this have on comparing prices? What implication does this have on tax calculations?
Regardless of how they are dealt with, we have to write more code to deal with these exceptional cases.

SELECT avg(total) FROM table WHERE (total >= 0). You have to write more code anyway to deal with NULL cases (IS NOT NULL). True, functions like avg() are smart enough to ignore NULLs, but in and of itself that is a small benefit. Meanwhile, addition, concatenation and comparison are all different anyway. NULL itself is nothing special, it's just a magic value that has its own keyword. The problem is that it changes the fundamental logic system in use which has far-reaching consequences that are confusing for new developers and annoying to experienced ones.

@havokk said:

Not at all. A null pointer, a pointer that is not currently pointing to anything, is not "unknown or inapplicable". It is a known state.

null is anything you want it to be. It's essentially just another value that can be used to represent whatever is convenient. The difference is that SQL pretty much forces you to deal with NULLs all over the place and most programming languages leave that up to the developer. So in Java null can mean uninitialized, unknowable, unknown, inapplicable or even an error state. It's all how you want to leverage that value. You once again seem to be confusing the meaning of null in the higher-level language with what it is commonly used to represent. Essentially it's just another magic value but most programming languages do not enforce the strict "NULL does not equal anything else" idiom that is found in SQL. Those languages maintain the simplicity and ease of boolean logic while still allowing the developer to define whatever magic values he would like for the data types in use. What's more, those magic values can be sensitive to the context of the data type which helps even more.

@havokk said:

Three predicate logic (true, false, null) is not really anything specific to logic - nulls apply to all data types so they apply to boolean data types.

NULLs don't always apply to data types. In my experience, NULL values are the extreme exception which makes it annoying that SQL is modeled around them.

emurphy

@Welbog said:

Like say you want to get all rows whose "city" field isn't "Denver". SELECT * FROM Doug WHERE City <> 'Denver' doesn't return any rows in which City is null. So you have to add a OR City IS NULL part to it. And then if you want to parameterize the thing to accept any criterion for the City field in place of Denver, you basically have to write a whole different clause in the case that the parameter you passed is null, since WHERE City <> NULL returns false for every field, and the OR City IS NULL part makes the thing return every null-City row, which is the opposite of what you want!

This is probably close enough:

WHERE coalesce(City,'') <> coalesce(@City,'')

unless you actually care about the difference between '' and null.

Probably better is:

WHERE NOT (City = @City)

SpoonMeiser

@Welbog said:

Like say you want to get all rows whose "city" field isn't "Denver". SELECT * FROM Doug WHERE City <> 'Denver' doesn't return any rows in which City is null. So you have to add a OR City IS NULL part to it. And then if you want to parameterize the thing to accept any criterion for the City field in place of Denver, you basically have to write a whole different clause in the case that the parameter you passed is null, since WHERE City <> NULL returns false for every field, and the OR City IS NULL part makes the thing return every null-City row, which is the opposite of what you want!

I don't know what kind of intuition drives this kind of insanity. I don't think I want to know.

What about the situation where, say, you have a person tables. You have incomplete data, say that locating people isn't the main purpose of your app. But anyway, you want to know all the people near Dave.

SELECT p2.name FROM person p1, person p2 WHERE p1.city = p2.city AND p1.name = 'Dave';

So what if We don't have a city field for Dave?

What is the correct answer here? We can't determin the correct answer.

What is definatly the wrong thing to do, is to return all the other people in the DB who we don't know where they live. That would be very broken.

In SQL, you're generally checking relationships between objects - joins and the like, which is why it uses a logic that is correct in these situations.

henke37

field <=> othervalue

ammoQ

@emurphy said:

unless you actually care about the difference between '' and null.

In Oracle, there is no such difference, so such a statement would not be portable.

@emurphy said:

WHERE NOT (City = @City)

Wouldn't work in Oracle (and probably not in MSSQLServer either), since not(NULL='DEVER') = not(null) = null

Welbog

@SpoonMeiser said:

What about the situation where, say, you have a person tables. You have incomplete data, say that locating people isn't the main purpose of your app. But anyway, you want to know all the people near Dave.

SELECT p2.name FROM person p1, person p2 WHERE p1.city = p2.city AND p1.name = 'Dave';

So what if We don't have a city field for Dave?

See, this is what morbo is talking about. It's the particulars of the situation that determine whether you want to treat nulls as a special value or if you want to treat them as just another value. Both of us can come up with contrived examples of either without end, because we both understand what SQL nulls are and why they are the way they are and because we're both intelligent people.

The problem is that SQL forces us to always treat null as a special value, even in situations where it doesn't make sense to do so. The other, larger problem is that having three-state logic is non-intuitive at best and a WTF at worst (see: FILE_NOT_FOUND). I don't mind nulls clobbering values when added or concatenated, I don't mind nulls bitching when used in certain nonsense operations. That's what nulls do.

What I mind is simple logic being warped into a horrible three-state monster because someone somewhere decided that NULL == NULL isn't true. It means that the equality comparison doesn't form an equivalence class, which means that pretty much every piece of intuition one can have regarding logic is thrown out the window.

We come to this site because people fuck up IT and we find it hilarious and strive to overcome such WTFs ourselves. Many WTFs are nothing more than overcomplications of things that really ought to be simpler. This is one of those things. People expect == (or whatever symbol is used to indicated it) to be an equivalence relation, which, among other things, guarantees that for all A in the operation's domain, A == A is a true statement. The simple fact is this is not true in SQL, and that's a problem.

morbiuswilters

@ammoQ said:

@emurphy said:
unless you actually care about the difference between '' and null.

In Oracle, there is no such difference, so such a statement would not be portable.

Which is just ludicrous in-and-of itself. I'm all in favor of removing the three-value logic from SQL but if you're going to have it, it should at least be consistent with the definition.

@ammoQ said:

Wouldn't work in Oracle (and probably not in MSSQLServer either), since not(NULL='DEVER') = not(null) = null

Are you serious? Is there anything Oracle can do right? NULL should never = anything.

WeatherGod

In the example about locating all records where the city isn't Denver, I would argue that if such a query is essential to the application, then it should be essential that the City column be NOT NULL. If NULLs were allowed in the City column, then you can't be sure that the records you get back aren't from Denver. You can only be sure about the ones you know. The same issue would be true for date/time fields as well. If it is important enough, you should have all of them.

Whenever I deal with databases, I rarely use NULLs for strings. Most of the time, it is used for numerical values, where the interpreatation that NULL means unknown value is perfect. It often occurs that an automated weather station will have a broken pressure sensor or thermometer. Doesn't mean that there was no pressure or that temperature was non-existent, I just don't know what it is. The use of magic numbers for missing values is very bad for numerical data. If I wanted to do averages or sums or whatever, I have to add in additional logic to exclude the magic value records (which might not be very reliable or efficient for floating point comparisons).

Ultimately, I treat SQL NULLs like NaNs. Because I knew about NaNs before I learned SQL, I never had a problem with understanding NULLs for SQL. I never equated SQL NULLs with programming NULLs, either.

ammoQ

@morbiuswilters said:

Which is just ludicrous in-and-of itself. I'm all in favor of removing the three-value logic from SQL but if you're going to have it, it should at least be consistent with the definition.

You might not know it (though the IHOC has several posts about it), but the empty string is the same as NULL in Oracle. The shortest possible non-null string has a length of 1. Having to deal with empty strings causes all kinds of WTFs in Oracle (especially in PL/SQL).@morbiuswilters said:

@ammoQ said:
Wouldn't work in Oracle (and probably not in MSSQLServer either), since not(NULL='DEVER') = not(null) = null
Are you serious? Is there anything Oracle can do right? NULL should never = anything.

Sorry, I didn't make it clear enough. Should have read ... since not(NULL='DEVER') == not(null) == null, where == is not an SQL operator, but the metaconcept of "evaluates to".

something=NULL evaluates to NULL as it should, but not (NULL=something) also evaluates to NULL, so the proposd query not(city='DENVER') would not return rows where city is null.

morbiuswilters

@ammoQ said:

You might not know it (though the IHOC has several posts about it), but the empty string is the same as NULL in Oracle. The shortest possible non-null string has a length of 1. Having to deal with empty strings causes all kinds of WTFs in Oracle (especially in PL/SQL).

I understood that, I just found it rather ridiculous. An empty string is not the same thing as NULL and should not be treated the same. So essentially Oracle has a flawed implementation of three value logic.

@ammoQ said:

Sorry, I didn't make it clear enough. Should have read ... since not(NULL='DEVER') == not(null) == null, where == is not an SQL operator, but the metaconcept of "evaluates to".
something=NULL evaluates to NULL as it should, but not (NULL=something) also evaluates to NULL, so the proposd query not(city='DENVER') would not return rows where city is null.

Okay, that is correct then. Where it gets annoying is when you want everything not matching Denver because you have to do (city <> 'Denver') or (city IS NULL).

ole_gustie

@morbiuswilters said:

I understood that, I just found it rather ridiculous. An empty string is not the same thing as NULL and should not be treated the same. So essentially Oracle has a flawed implementation of three value logic.

Try working in coldFusion. One of the many WTF's of that piece of shit language is that queires returned from the database automatically concert NULL values to empty strings. Couldn't agree more. An empty string is NOT the same as a NULL value.

@ammoQ said:

the empty string is the same as NULL in Oracle.

That's pretty messed up. There's no valid definition of null that makes such a result correct.

If you define null as:

Nothing - then it's obviously incorrect because an empty string is something.
Unknown/undefined - then it's still wrong because the empty string is both known and well-defined.
Not in use / not applicable - clearly wrong because there is no way to indicate something that is in use but is blank.

The only conceivable scenarios I could ever see this property being useful for are hideously bad designs, like generating tables from fixed-width text files or having every column a varchar(50).