PL/SQL fuzzy logic



  • Yes, null does, for some cases, equal to an empty string in Oracle. That's old news.

    What's more interesting is that in boolean context null is neither true nor false, so we have a very nice fuzzy logic engine at our perusal. Combined with some PL/SQL features, this becomes entertaining.

     Consider the following example:

    declare
    -- declare a table of single-character elements
    type tab_t is table of varchar2(1);
    -- and declare a variable of that table type
    tab tab_t;
    -- write a procedure to tell whether an element exists in the table;
    -- check the result both with straightforward logic and with negated logic;
    -- it is expected that both checks print out the same result
    procedure check_member(elem varchar2, tab tab_t) as
    cond varchar2(20);
    begin
    cond := 'Non-negated';
    if elem member of tab
    then dbms_output.put_line(cond||': Is member');
    else dbms_output.put_line(cond||': Is not member');
    end if;
    cond := 'Negated';
    if not (elem member of tab)
    then dbms_output.put_line(cond||': Is not member');
    else dbms_output.put_line(cond||': Is member');
    end if;
    dbms_output.put_line('');
    end check_member;
    begin
    -- first test; this is fine
    dbms_output.put_line('Is "a" member of list ("a")');
    check_member('a',tab_t('a'));

    -- second test; this is fine
    dbms_output.put_line('Is "a" member of list ("b")');
    check_member('a',tab_t('b'));

    -- third test; looks good
    dbms_output.put_line('Is "a" member of list ("a",null)');
    check_member('a',tab_t('a',null));

    -- fourth test; not so good
    dbms_output.put_line('Is "a" member of list ("b",null)');
    check_member('a',tab_t('b',null));
    end;

    For the first three test cases the results for both negated and non-negated tests are equal and as expected; "Is member" for the first and third, and "Is not member" for the second test case. The fourth one is the oddity; it'll print out:

    Is "a" member of list ("b",null)
    Non-negated: Is not member
    Negated: Is member

    After some perusal of Oracle manuals, this appears to be caused by the null handling in boolean Oraclean logic. Result of a boolean operation where one of the operands is null, will be null. So, apparently, the presence of a null element in the list will, for the case where the searched element does not exist in the list, causes the "member of" operation to return null. And "not null" is "null" as well, and as a result for both tests the execution falls in the "else" branch (because null is not true). Curiously, for the case where the list does contain the searched element, the presence of null does not have any unexpected effects. And yes, I did also test different ordering of the list members, as well as lists with more than just two members; these things do not seem to change the results.

    Let's say it took some time to dig this out from a piece of misbehaving code.



  • It does make sense, and the code is only 'misbehaving' because you have a boolean test for tri-state logic. i.e. you are missing some conditions.

    If you consider null to be 'unknown value' and not some kind of 'special zero or blank' it makes more sense.

    Is 'a' in the list ('a',null) - definately yes.

    Is 'b' in the list ('a') - definately no.

    Is 'b' in the list ('a',null) - well I'm not sure, because even though it's not in the ones I know about there is an unknown value.

    You are missing "is null" in your tests.  The behaviour is logical, you're just not checking correctly.



  • In a way your explanation does make sense.

    And that would bring a kind of solution, so instead of plain

    if not (elem member of list)

    it would be needed to write

    if ((elem member of list) is null) or (not (elem member of list))

    which just doesn't seem that much readable any more - unless I wrap it again into a function that really does return a boolean and not null or some other odd creature. Especially when I can write the reverse of the same condition without any special clause.I would accept failure of the comparision if the list itself was null, but not when a member of the list is null - because that's what the list operators are for - to allow me to work on sets of data without knowing in advance what is inside the list.



  • I'm glad my explanation makes sense 'in a way'.  Unfortunately for you it actually makes sense 'in a way that reflects the reality of SQL', so you have to live with it.

    If you have unknowns in your data then you have to handle them.  I think you'll find that you cannot dodge the issue by writing the reverse, assuming you mean get a non-null answer to "is this in my list" (see my last example).

    What you've done so far is to establish if an element is not in a list or the list contains an unknown value, and this is the best you will be able to do.

    What you need to do is determine what happens when the answer is 'I don't know' or completely remove this possibility by using a NOT NULL constraint on the underlying data. 

    Whether you wrap it up in a neat little function or not is irrelevent until you get past this part. 



  • I think jlaiho's confusion is warranted. Booleans behave very consistently across nearly all programming languages (and indeed nearly all formal systems). SQL is the odd on out with the way it handles nulls and booleans. SQL's three-state logic is the the WTF, although that has nothing to do with Oracle specifically. The same issues exist in every SQL implementation.



  • "I think jlaiho's confusion is warranted"

    no it isn't.  Why not?  Because:

    "The same issues exist in every SQL implementation"

    Tri-state is not illogical or stupid or anything, he just doesn't know it (which is quite a gap for a developer expected to use SQL but that's another story).

    As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB? You have to store and handle it somehow, it just happens SQL does it for you (once you understand it).  If there is no requirement for unknown then simply ensure the column is not null and you're done.



  • @LoztInSpace said:

    Tri-state is not illogical or stupid or anything, he just doesn't know it (which is quite a gap for a developer expected to use SQL but that's another story).
    Illogical, no. Stupid, I would say yes. But that's subjective.

    @LoztInSpace said:

    As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB? You have to store and handle it somehow, it just happens SQL does it for you (once you understand it).  If there is no requirement for unknown then simply ensure the column is not null and you're done.
    I would do it the way nearly every other formal system does it: true and false. Two-state logic, two-state operations. No special logical operators that apply to nulls differently compared to other operands.

    In the OP's example, `'a' member of ('b',null)` should return false. Because it's false. {'b',null} is a set, and the element 'a' is not a member of that set. That is how logic works in nearly every other formal system, because that is the intuitive and natural way logic works. If an operator is applied to a null and that operation doesn't make any sense, then the operation should either take a parameter to tell it what it should do if it sees a null (i.e. treat as true, treat as false, skip, fail) or have an exception mechanism. But if an operation does make sense to be applied to nulls (like member of), then regular two-state boolean return values should be used.

    SQL's logic not only goes against established formal system tradition but also against human intuition regarding operations. That's an utter failure at usability, and I have yet to see a real benefit from treating logic and nulls this way in my years of dealing with SQL. In fact, I have seen quite the opposite: queries that are much longer than they would have been if nulls weren't treated as special magical values with their own operations and return values.



  • @LoztInSpace said:

    "I think jlaiho's confusion is warranted"

    no it isn't.  Why not?  Because:

    "The same issues exist in every SQL implementation"

    This is not a defense.  Stupidity is not validated by repetition.

    @LoztInSpace said:

    Tri-state is not illogical or stupid or anything
    It's illogical in the sense that it behaves contrary to the way everything else does.@LoztInSpace said:
    As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB?
    First of all, I wouldn't use a word like "NULL", which has distinct connotations, when I really ought to use "UNKNOWN".  Hey, I could even use "UNK" and save everyone a character!  Plus, UNK is fun to say.  UNK, UNK, UNK.

     Second, I'd recognize that programmers want a yes/no answer.  It's all well and good to have a ternary logic, but then why aren't the control structures designed for it?  It's like they went out of their way to make SQL difficult to use.

    @LoztInSpace said:

    You have to store and handle it somehow, it just happens SQL does it for you (once you understand it).
    Or, here's a novel idea, design your language on behalf of the developers, not the mathematical theory.  Yes, unknowns are all well and good, but making the default behavior of a language tedious and prone to error isn't a good way to go about it.  The times where I want a yes/no answer far outweigh the times when I want a yes/no/unknown answer, so design with that in mind.  The time to be tedious is when I really need that extended functionality, not the simple, every-day stuff.


  • :belt_onion:

    @Welbog said:

    Illogical, no. Stupid, I would say yes. But that's subjective.

    @LoztInSpace said:

    As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB? You have to store and handle it somehow, it just happens SQL does it for you (once you understand it).  If there is no requirement for unknown then simply ensure the column is not null and you're done.
    I would do it the way nearly every other formal system does it: true and false. Two-state logic, two-state operations. No special logical operators that apply to nulls differently compared to other operands.
    As always, two-state and tri-state logic have their place and the OP needs to know about the difference. SQL offers both options - make your column non-nullable if you want two-state instead of complaining about it.

    I'm currently working on an application that processes insurance claims. Unlike bstorer, I have to deal with many questions that are not just yes/no but often are yes/no/don't know. Or not all questions are mandatory. If the answer to a question is unknown, you can't assume that it is false by default.

    Two-State driving under influence

    isDriving? | and | isDrunk?
    -----------+-----+---------
    Yes | Yes | Yes
    Yes | No | No
    No | No | Yes
    No | No | No

    Tri-State driving under influence adds:

    isDriving? | and | isDrunk?
    -----------+-----+---------
    ? | ? | Yes
    ? | ? | No
    Yes | ? | ?
    No | ? | ?
    ? | ? | ?

    If someone isDriving but you don't know if he isDrunk, then you also don't know if he was driving under influence.

    No fuzziness here - please move along :-)



  • @bstorer said:

    @LoztInSpace said:

    "I think jlaiho's confusion is warranted"

    no it isn't.  Why not?  Because:

    "The same issues exist in every SQL implementation"

    This is not a defense.  Stupidity is not validated by repetition.

    That's not whay I am saying.  Confusion is not warranted because all databases behave like this.  You don't have to like it but that's the fact and that's why it should not be confusing.

    @bstorer said:

    @LoztInSpace said:

    Tri-state is not illogical or stupid or anything
    It's illogical in the sense that it behaves contrary to the way everything else does.@LoztInSpace said:
    As for your assertion that SQL is odd, well how would you handle yes/no/unknown values is Java/C++/VB?
    First of all, I wouldn't use a word like "NULL", which has distinct connotations, when I really ought to use "UNKNOWN".  Hey, I could even use "UNK" and save everyone a character!  Plus, UNK is fun to say.  UNK, UNK, UNK.

    Agree.  I'd love to see the distinction between unknown, not supplied, empty string (I kid - I kid!) etc.  It's a bit overloaded. 

    @bstorer said:

    Second, I'd recognize that programmers want a yes/no answer.  It's all well and good to have a ternary logic, but then why aren't the control structures designed for it?  It's like they went out of their way to make SQL difficult to use.

    Of you want yes/no then use NOT NULL constraints.  If you allow nulls in boolean then you need to deal with tri-state.  How can you not?  You have 3 values!  As for the control structures, they are there - you just don't like them (assuming you mean using IS NULL)

    @bstorer said:

    @LoztInSpace said:

    You have to store and handle it somehow, it just happens SQL does it for you (once you understand it).
    Or, here's a novel idea, design your language on behalf of the developers, not the mathematical theory.  Yes, unknowns are all well and good, but making the default behavior of a language tedious and prone to error isn't a good way to go about it.  The times where I want a yes/no answer far outweigh the times when I want a yes/no/unknown answer, so design with that in mind.  The time to be tedious is when I really need that extended functionality, not the simple, every-day stuff.

    Same as above.  You use NOT NULL when you want a yes/no answer.  This stuff is driven by business requirements, data models and designs that reflect those requirements.  You have to choose - always.  It's no more tedious or error prone than deciding whether a column or variable has to be a number, date, varchar or whatever (although some seem to get that wrong pretty regularly too).  Are you seriously asserting that all these 'problems' are actually down to some arbitrary syntax of the CREATE TABLE statement?

    By the way, the GUI table creator in SQL Server seems to default the 'allow nulls' tick box to be off, which tends to be correct.



  • @LoztInSpace said:

    Of you want yes/no then use NOT NULL constraints.  If you allow nulls in boolean then you need to deal with tri-state.  How can you not?  You have 3 values!  As for the control structures, they are there - you just don't like them (assuming you mean using IS NULL)
    The OP's example wasn't storing a null in a boolean-type column, it was a return value from a built-in function. I think that it should be up to the designer to decide how nulls in boolean columns should be handled using some kind of flag as part of the query (or some other mechanism) instead of globally saying "null has the meaning 'might be true or false'", which is an assumption that SQL forces upon you. Null doesn't always mean that, and in my experience it normally means something more like "I don't give a damn what the value is" I don't like that my database software imposes assumptions regarding the meaning of data this way.

    But more in line with the OP's confusion, his "member of" function does not return false when asked if 'a' is a member of the set {'b',null}. This isn't rocket science, it's basic set theory: 'a' is not a member of {'b',null}. SQL is imposing the assumption that null "might be" 'a', so it returns null instead of false. That is a confusing imposition that very few formal systems make, because the majority of formal systems let you define for yourself what null means. Just because the imposition is well documented and easy to understand once you know what it is doesn't make it a good idea or even a straightforward idea. It's confusing and unnecessary. It's counterintuitive and usually results in larger, more complicated queries.

    That's what this is. It's perfectly logical, understandable, sound, unnecessary and counterintuitive. Null, intuitively, has no meaning other than "no other value is stored". SQL is designed under the assumption that null means "this could represent anything". It is my opinion that this is a bad assumption, because that's not what null means in every scenario. You could argue that since SQL was designed this way, that if you're using null to mean something other than "this could represent anything" you're using SQL incorrectly. That's fair, but all of my other systems use nulls differently so then I would have to overengineer a solution to store my meaningless nulls meaningfully in SQL. To me, SQL nulls are designed wrong, plain and simple.



  • @Welbog said:

    But more in line with the OP's confusion, his "member of" function does not return false when asked if 'a' is a member of the set {'b',null}. This isn't rocket science, it's basic set theory: 'a' is not a member of {'b',null}. SQL is imposing the assumption that null "might be" 'a', so it returns null instead of false. That is a confusing imposition that very few formal systems make, because the majority of formal systems let you define for yourself what null means. Just because the imposition is well documented and easy to understand once you know what it is doesn't make it a good idea or even a straightforward idea. It's confusing and unnecessary. It's counterintuitive and usually results in larger, more complicated queries.
    This, I think, is the key point.  Mathematically valid or not, consistent across all DBs or not, it's still unacceptable because it makes our jobs harder.



  • @Welbog said:

    That's what this is. It's perfectly logical, understandable, sound, unnecessary and counterintuitive. Null, intuitively, has no meaning other than "no other value is stored". SQL is designed under the assumption that null means "this could represent anything". It is my opinion that this is a bad assumption, because that's not what null means in every scenario. You could argue that since SQL was designed this way, that if you're using null to mean something other than "this could represent anything" you're using SQL incorrectly. That's fair, but all of my other systems use nulls differently so then I would have to overengineer a solution to store my meaningless nulls meaningfully in SQL. To me, SQL nulls are designed wrong, plain and simple.

     

    It's the OP here; I've been silent, and listening - erm, reading. Thanks for all about the discussion so far, it's been an educating one.

    When reading the above, it brought to my mind some claims done before y2k that there are systems where "no year given" could be stored as "99". Somehow it starts to sound as an echo of the above: if you have a "not null" numeric column, then you'll have to use one of your real data values to signify "no data", because the null is forbidden, and even if allowed would only mean "there might or might not be data". But now I fear that I'll be attacked by data design purists insisting to tell me that there should not be that kind of column at all, but instead the data that does not occur on each row should be split off to a separate table - at which point lack of row in the other table would indicate lack of data.

    But this, of course, would be again futzed by some apps programmer doing an outer join of these two tables, and being able to retrieve a null for the row where there was no value. Oh my.. "no value" just changed into "might or might not be a value".

    Perhaps there really should also be a token for a real "no value - this variable has intentionally been left blank", in addition to the current "perhaps someone just forgot to jot down the value". And thanks for whoever it was who wrote the example where the "don't know" answer truly made sense; it helped to illustrate the value of three-value logic.



  • @bjolling said:

    Tri-State driving under influence adds:

    isDriving? | and | isDrunk?
    -----------+-----+---------
    ? | ? | Yes
    ? | ? | No
    Yes | ? | ?
    No | ? | ?
    ? | ? | ?

    If someone isDriving but you don't know if he isDrunk, then you also don't know if he was driving under influence.

    No fuzziness here - please move along :-)

     

     In fact, the 3-state logic according to relational theory is:<u3:p></u3:p><o:p></o:p>

    isDriving? | and | isDrunk?
    -----------+-----+---------
    ?          | ?   | Yes
    ?          | No  | No
    Yes        | ?   | ?
    No         | No  | ?
    ?          | ?   | ?<u3:p></u3:p>
    <o:p></o:p>

    and it's absolutely correct (if driver was not driving, who cares if he was drunk, and if he's not drank he's not drunk driver - driving or not).

    I believe the same logic is implemented in Oracle (SQL serve has it implemented since SQL 2000). <u3:p></u3:p><o:p></o:p>

    The basic and most common misunderstanding of the meaning of NULL lays in the fact that most of the people tend to apply one meaning to it. Unfortunately that's not right. Null was introduced into relational theory to cover temporary unavailable information (because touple - or row if you prefer - can not contain null values, but null is necessary to allow existence of temporary incomplete touples). Later on even Codd started experimenting with idea of having two types of null - one representing simple missing information and other one that will represent unknown and/or not applicable. Wrong, in my opinion. Null should be avoided as much as possible, and should be used only to allow temporary missing values of the attribute for witch we are sure has a value. E.g. DateOfBirth - it could be unknown, but we are sure everyone was born on some date. Personally, I would recommend to keep incomplete rows in a separate table, if it is possible of course.<u3:p></u3:p><o:p></o:p>

    If you have column where null means not aplicable (like SSN for non-US citizens), you have table that is not normalized. If your table is fully normalized, you should have tables Person(PersonID, Name, Surname,...) and PersonSSN(PersonID,SSN). But this does not solve the problem, because view that join (outer) those two tables would still have null for a non-US and we could not know if it means "not applicable" or "missing". Or, maybe, we can?<u3:p></u3:p><o:p></o:p>

    PersonID | Name  | Surname      PersonID | SSN
    -----------+-----+---------    ----------+-----
    1        | John  | Smith        1        | aaa
    2        | John  | Doe          2        | NULL
    3        | Johan | Strauss      5        | bbb
    4        | Jean  | Morris
    5        | George| West<u3:p></u3:p>
    <o:p></o:p>

    It's easy to make a query that will tell you PersonID 3 and 4 are non-US with not applicable SSN, but for 2 SSN is missing.<u3:p></u3:p><o:p></o:p>

    What happens if you want to list all the people with SSN different than given one (or the range of SSN). This is usual complain against three-state-logic, but it actually tells more about question. Do you realy know what you want with this question?<u3:p></u3:p><o:p></o:p>

    • I want list of the people with SSN different than given one, and I do not want foreigners in the list
    • I want list of the people with SSN different than given one, with all the foreigners in the list<u3:p></u3:p><o:p></o:p>

    But there is another question: how do I want to treat Mr. John Doe? <u3:p></u3:p><o:p></o:p>

    • I do not know his SSN  and I do want him to appear in the list because I want list of all the people that do not have this SSN together with a people that maybe don't have it (Mr. John Doe IS on the list), or 
    • I do want a list of a people I'm 100% sure do not have this SSN (in which case Mr. John Doe will not appear on the list)<u3:p></u3:p><o:p></o:p>

    Combining previous two cases, we can write 4 different queries, providing 4 different result sets. Which one is right, depends on the question. <u3:p></u3:p><o:p></o:p>

    You see, proper using of NULL values gives you opportunity to provide right answer on the right (well defined) question. What kind of two-state-logic could provide it? No one, except if you define "special case SSN" with a meaning on "UNKNOWN" and "NOTAPPLICABLE". That's fine, but you will still need to understand question because you can provide 4 different answers even in this case. <u3:p></u3:p><o:p></o:p>

     <u3:p></u3:p><o:p></o:p>
     <u3:p></u3:p><o:p></o:p>

    @Welbog said:

    But more in line with the OP's confusion, his "member of" function does not return false when asked if 'a' is a member of the set {'b',null}. This isn't rocket science, it's basic set theory: 'a' is not a member of {'b',null}. SQL is imposing the assumption that null "might be" 'a', so it returns null instead of false. That is a confusing imposition that very few formal systems make, because the majority of formal systems let you define for yourself what null means. Just because the imposition is well documented and easy to understand once you know what it is doesn't make it a good idea or even a straightforward idea. It's confusing and unnecessary. It's counterintuitive and usually results in larger, more complicated queries.
    <u3:p></u3:p><o:p></o:p>

    u3:p</u3:p>

     <u3:p></u3:p><o:p></o:p>

    Wrong. <u3:p></u3:p><o:p></o:p>

    I mean, I agree with "This isn't rocket science, it's basic set theory", but I do not agree with the statement: 'a' is not a member of {'b',null} .<u3:p></u3:p><o:p></o:p>

     Truth is - we do not know if a is member of {b, null} because null, as I said before, means unknown. You have set (basket?) containing 1 apple and one box. We don't know what is in the box (aka null).Can you tell you are 100% sure there's no pear in the box? You can not, because you don't know if inside box is another apple, pear, plum or box is empty. <u3:p></u3:p><o:p></o:p>

    Why this is not obvious? Simple, because people mix null with the empty - those two terms are not synonyms. The statement "Pear is not a member of set containing apple and empty box" is true. But, statement "Pear is not a member of set containing apple and box" (missing word is empty, and box represents null, meaning we do not know what's inside box) is neither true nor false, because we do not know. <u3:p></u3:p><o:p></o:p>

    Three-state-logic. Like it or not, but you need it if you want to give right answer on right question. Don't blame RDBMS if you didn't understand question or customer didn't know what he wants. <u3:p></u3:p><o:p></o:p>

    By the way, could anyone explain me how all previous cases will be easy and simple to solve in any other programming language? <u3:p></u3:p><o:p></o:p>

    I apologize if you find my grammar and/or vocabulary is poor and
    wrong, but I'm not a native English speaker, and I do my best to improve.


  • :belt_onion:

    @niik said:

    @bjolling said:

    Tri-State driving under influence adds:

    isDriving? | and | isDrunk?
    -----------+-----+---------
    ? | ? | Yes
    ? | ? | No
    Yes | ? | ?
    No | ? | ?
    ? | ? | ?

    If someone isDriving but you don't know if he isDrunk, then you also don't know if he was driving under influence.

    In fact, the 3-state logic according to relational theory is:
    isDriving? | and | isDrunk?
    -----------+-----+---------
    ?          | ?   | Yes
    ?          | No  | No
    Yes        | ?   | ?
    No         | No  | ?
    ?          | ?   | ?
    Heh, how did I miss that? Next time I'll engage my brain before posting, I promise.


  • @niik said:

    Why this is not obvious? Simple, because people mix null with the empty - those two terms are not synonyms. The statement "Pear is not a member of set containing apple and empty box" is true. But, statement "Pear is not a member of set containing apple and box" (missing word is empty, and box represents null, meaning we do not know what's inside box) is neither true nor false, because we do not know.
    Spoken like a true SQL programmer.

    In most programming languages, null has the meaning "no value is associated with this variable or expression". Empty has the meaning "this variable of expression has an empty value". The distinction exists and is very real (because it is an important distinction). In parsing theory, the DFA that accepts the null string is different from the DFA that accepts the empty string. The DFA that accepts the null string doesn't accept any string. It not that it "maybe" accepts some string, it just doesn't accept any string. That is the nature of null as I see it. Null doesn't mean that this expression "might" have a value, it means that the expression doesn't have a value. If, like in your examples, null does mean that it "might" have a value, then it's up to you to treat null that way. In my experience null means "no value" more often than it means "might have a value".

    It should not be the RDBMS that decides what null means; it should be up to me.



  • @niik said:

    The basic and most common misunderstanding of the meaning of NULL lays in the fact that most of the people tend to apply one meaning to it. Unfortunately that's not right. Null was introduced into relational theory to cover temporary unavailable information (because touple - or row if you prefer - can not contain null values, but null is necessary to allow existence of temporary incomplete touples). Later on even Codd started experimenting with idea of having two types of null - one representing simple missing information and other one that will represent unknown and/or not applicable. Wrong, in my opinion. Null should be avoided as much as possible, and should be used only to allow temporary missing values of the attribute for witch we are sure has a value. E.g. DateOfBirth - it could be unknown, but we are sure everyone was born on some date. Personally, I would recommend to keep incomplete rows in a separate table, if it is possible of course.

    If you have column where null means not aplicable (like SSN for non-US citizens), you have table that is not normalized. If your table is fully normalized, you should have tables Person(PersonID, Name, Surname,...) and PersonSSN(PersonID,SSN).

    Look, I know the mathematics behind the relational model, and I'm certain Welbog does, too.  But you're missing the point entirely: nobody cares what the mathematical basis is.  The adherence to the mathematics are an impedence to getting work done.

    We have real jobs to do and we aren't interested in things that make those jobs harder.  I can count on one hand the number of times I've had a situation where I was looking for '1234' and would accept NULL in its place.  On the other hand, the number of times I need '1234' and only '1234' are too numerous to count. Over-normalized data and the plethora of joins that result are more often a hinderance than a help.

    There's a reason we don't write our applications in Prolog.  If you want to have this behavior in some academic database package, knock yourself out, but get it out of my production system.



  • @bstorer said:

    Look, I know the mathematics behind the relational model, and I'm certain Welbog does, too.
    I remember when I took database design in university and we covered tri-state logic. I thought, "Hmm, that's neat, I guess," when it was first introduced. Then we had an assignment that dealt with it and nulls and I quickly learned that all of the extra checks I had to do were annoying and tedious. Nothing about tri-state logic is difficult to use in practice, it's just annoying as shit.



  • @Welbog said:

    It should not be the RDBMS that decides what null means; it should be up to me.

    Look, we practicaly agree about everything, except the small issue you stated at the end. In fact, I agree with that sentence too, but it seems we read it in different way.

    It is up to you,

    You want all peeple with SSN different then 'aaa'  including the people without SSN, you write
    WHERE SSN is Null or SSN<>'aaa'.

    You want list of people that have SSN, but different than 'aaa' you write
    WHERE SSN is Not Null and SSN<>'aaa'.

    It's not a big deal, it's clear on first sight, and at the end it was you that gave meaning to the NULL.

    You have opportunity to choose meaning, thanks to the 3-state-logic and RDBMS

     



  • @niik said:

    It is up to you,

    You want all peeple with SSN different then 'aaa'  including the people without SSN, you write
    WHERE SSN is Null or SSN<>'aaa'.

    You want list of people that have SSN, but different than 'aaa' you write
    WHERE SSN is Not Null and SSN<>'aaa'.

    It's not a big deal, it's clear on first sight, and at the end it was you that gave meaning to the NULL.

    You have opportunity to choose meaning, thanks to the 3-state-logic and RDBMS

    Earlier in this thread I pointed out that it's asinine to use a ternary logic internally and then only provide binary control structures.  Thanks for demonstrating that so nicely.


  •  @bstorer said:

    nobody cares what the mathematical basis is.  The adherence to the mathematics are an impedence to getting work done.

    It's hard to imagine something more wrong that those sentencies. It's pure example of, somethimes very skilled and briliant, developers which creates applications that runs just thanks to the rawprocessor power, huge amount of RAM and fast IO. Because they are ignorant, because they tend to think that only application code and developement speed counts and mostly because they don't want to waste their precious time to stupid and uslessthings like design and modeling. They can do everything on the fly, just give them an IDE and they'll make tables as they need them They are wizards. Andat the end they blame RDBMS, underscaled hardware, slow OS or incompetent DBA. Typical.

    Work done and work well done are two  different things.

    @bstorer said:

    We have real jobs to do and we aren't interested in things that make those jobs harder

    I have a real job, and my duty last 25 years was and is to have jobs well done.

    @bstorer said:

    I can count on one hand the number of times I've had a situation where I was looking for '1234' and would accept NULL in its place. 

     Where SSN is null orSSN='1234'

    @bstorer said:

    On the other hand, the number of times I need '1234' and only '1234' are too numerous to count.

     Where SSN='1234'

    Where is the problem, you have shorter  where clause in your more often case

     @bstorer said:

    Over-normalized data and the plethora of joins that result are more often a hinderance than a help.

    Wrong. You can always make views that will hide "plethora of joins" from the application. Sentence "having lot of joins directly means worse performance" is wrong too (except in the case of incorrect joins and joins not supported with proper indexes). I had to prove that hundreds of times until now, and never failed, believe me.

    It's a wrong design, wrong understanding of requirements and customer needs and wrong implementation (including joins, views, queries, indexes etc)  that is cause of hinderance in most of the cases, not "overnormalization".

    Btw. There's not such thing as "overnormalization" because your schema is normalized or not. OK, could be de-normalized, but this term has different meaning than not-normalized.



  • @bstorer said:

    it's asinine to use a ternary logic internally and then only provide binary control structures.
     

    Here, I absolutely and fully agree with you :-)

     

     



  • @niik said:

    Wrong. You can always make views that will hide "plethora of joins" from the application. Sentence "having lot of joins directly means worse performance" is wrong too (except in the case of incorrect joins and joins not supported with proper indexes). I had to prove that hundreds of times until now, and never failed, believe me.
    Your whole post underlies a complete misunderstanding of getting work done.  I'm not interested in code performance.  Hell, I use Ruby, how could I be?  I'm concerned about the amount of time developers spend doing the tedious and the redundant.  SQL is tedious, and that's a black mark against it in my book.  When I speak of a hinderance, I speak of a hinderance to me, the developer.

     @niik said:

    Btw. There's not such thing as "overnormalization" because your schema is normalized or not. OK, could be de-normalized, but this term has different meaning than not-normalized.
    That depends: last I checked there were approximately fourteen million levels of normal forms (That might be a slight overestimation.).  At what level does it become "normalized"?


  • Discourse touched me in a no-no place

    @bstorer said:

    @niik said:
    Btw. There's not such thing as "overnormalization" because your schema is normalized or not. OK, could be de-normalized, but this term has different meaning than not-normalized.
    That depends: last I checked there were approximately fourteen million levels of normal forms (That might be a slight overestimation.).  At what level does it become "normalized"?

    At the point where you've examined the current schema/schemas/schemata*, and decided that 40 tables per record, and 12,000 'join tables' are a bit of an overkill, and effectively denormalized 13,999,995 levels to improve the performance of your application?

    * where appropriate. While not an invitation for a sub-thread....



  • @PJH said:

    [quote user="bstorer"]@niik said:

    Btw. There's not such thing as "overnormalization" because your schema is normalized or not. OK, could be de-normalized, but this term has different meaning than not-normalized.
    That depends: last I checked there were approximately fourteen million levels of normal forms (That might be a slight overestimation.).  At what level does it become "normalized"?

    At the point where you've examined the current schema/schemas/schemata*, and decided that 40 tables per record, and 12,000 'join tables' are a bit of an overkill, and effectively denormalized 13,999,995 levels to improve the performance of your application?

    * where appropriate. While not an invitation for a sub-thread....

    [/quote] 

    Huh, I'm a bit confused now. Sorry for off-topic, but discussion went in that direction.

    First, talking about normalization there are 6 normal forms only, as I know. And I've never seen in my whole life database that is normalized to the 6th normal form. Or we are not talking about the same forms :-) Definitively not.  Could you explain me what is denormalized level, please (taking in account that could be over 14 millions of them)?

    I've never heard about term "tables per record". Usualy it goes vice versa. Do you create new tables for each record? If you do, that could be defined as a kind of partitioning rather than normalization. Personally, I can't imagine reason why it should be done in that way, but this does not mean it's wrong - I just don't know "full story". The same is valid for 12000 joined tables.

    Second, I've made a crucial mistake when I mentioned performance, but this is a part of my background. Performance counts as well as resource utilization, hardware requirements, software cost and development time. And, as they are usually on opposite sides of an equation (mats again ;) there's no universal truth. In some cases I'm using plain text files to store and retrieve data. In some other cases I'll  use data in the spreadsheet and perform just order by column and visual search to find the data I need, rather than create tables, indexes, views. It depends on needs.

    If performance is not your concern that's just fine, but you can not generalize based on this starting premise. If you buy Ferrari you can't complain it consumes too much fuel, maintenance fees are high and insurance is killing just because you use it once per week to go to the supermarket. Relational databases are, like it or not, the most efficient systems to store and retrieve data, at least today (excluding some special cases). And there are bunch of them with different prices, characteristics, support options etc. You should choose one that fits you. And there are still other options that could be evaluated if performance is not an issue.

    The problem with "job done" approach is that it is used too often as an excuse for making application without any planning - just decide: we will use this for data persistency and rush into the code. We will make design on the fly, we will decide structure when need it, someone else will be in charge to tune it, anyway. And, after a while, we start complaining about that not being friendly to us. Of curse is not. You can't imagine how many times "the huge development problem" that involved a lot of people writing complex code with nested cursors and recursions (or loops, doesn't really matter), involving functions and procedure calls - or say it in simple way - spending a lot of developers time was solved by simple redesign and set instead of row oriented operations. Once complex code, with thousands and thousands lines, hard to debug, prone to errors, resource consuming with slow response, became short, fast and reliable piece of code. I did that. A lot of times. And developers, by accepting those changes and recommendations,  were increasing efficiency a lot. Every next job was done faster. Because it was designed and created on the rock solid base of a proper application development approach. First think, then code, and of course, lern from mistakes :-)  

    But this also depends on needs, as I said before. I would not spend my time developing state-of-the art application when it doesn't need to be, and I'll use small city car to go to a shopping mall, not Ferrari.


  • Discourse touched me in a no-no place

    @niik said:

    First, talking about normalization there are 6 normal forms only, as I know. And I've never seen in my whole life database that is normalized to the 6th normal form. Or we are not talking about the same forms :-) Definitively not. 
    The 14 million was a joke.

     @niik said:

    Could you explain me what is denormalized level, please
    Denormalization is where, having taken your schema to, say, 5th NF, you take a positive decision for performance or other reasons, to take some of it back to 4th NF undoing some of the steps you took to take it to 5th NF.

    Note that this is totally different to simply stopping at 4th NF and not going any further.

     @niik said:

    I've never heard about term "tables per record". Usualy it goes vice versa.
    Again, a joke.

     



  • @PJH said:

    Denormalization is where, having taken your schema to, say, 5th NF, you take a positive decision for performance or other reasons, to take some of it back to 4th NF undoing some of the steps you took to take it to 5th NF.

    Note that this is totally different to simply stopping at 4th NF and not going any further.

    Thanks for better explanation of what I said:

    @niik said:

    OK, could be de-normalized, but this term has different meaning than not-normalized.



  • How to minimize usage of NULL in DB

    Just to try to clarify some thinghs, if I can (in fact I've just finished writing of the document (below) when I found this thread). 

    How to minimize usage of NULL

    Even NULL is allowed in both relational theory and practice its usage should be avoided as much as possible. As NULL is defined as representation of unknown piece of data it should be treated in that way. That means, strictly following relational theory, if an attribute can be omitted it doesn't belong to the tuple because tuple contains values and NULL is not a value.  Why NULL is defined in the relational theory then? Because NULL represents missing data at the moment of processing. The question is does sentence "tuple does not have this attribute" is the same with "I do not know value of the attribute". 

    Put it in simpler way, if we have basic personal data having just two attributes: name and job, what is the meaning of null assigned to the job attribute? Does it mean we don't know the person's job or he is unemployed? Or could be this specific person belongs to category where employment status is not applicable, e.g. school and preschool kids. If we have to send job offer to the unemployed we can't clearly distinguish it as well as we can't distinguish missing attribute values if we want to make a list of people with incomplete data.

    How we can solve that problem?

    Introducing predefined values for "unemployed" and "not applicable" makes picture much clearer. Introducing "unknown" can further make a difference between really unknown and "not entered yet". Now we can do all the queries we need, even we can sent a note to the relevant employees to keep up with entering data if there are still a lot of NULL jobs, meaning there are a lot of data pending to be entered. This could be particularly valuable in case of parent-child relations such as catalogue tables (countries, organizations, locations etc.). It also adds value to the information provided to the end user - displaying, on example, values for location "not applicable" and "unknown" or "not entered yet" for a NULL in most cases is better than displaying just NULL or empty string. Of course, the "instead of NULL" values should be selected based on the business requirements, e.g. "unknown" could be omitted and null can be used to show not entered and unknown at the same time, or we can define column as a not null and introduce separate value for unknown.

    Advantage of covering NULL with predefined values, apart from more descriptive presentation of data, lays in better coverage of two valued logic. In case of filters like "not equal" the result set will never contain rows with a NULL.

    <font face="courier new,courier">WHERE Location<>'Loc1'</font> will return all rows with locations different from 'Loc1' but not the rows with NULL locations. To avoid that we have to write filtering clause like: <font face="courier new,courier">WHERE Location is null or Location<>'Loc1'</font>.

    If we don't care about what is a meaning of NULL (we assume null means "no job" for whatever reason - unemployed, not applicable or unknown) and percentage of the rows containing NULL is high (usually high is more than 10% of rows containing NULL) then we have to normalize our table into two related tables - persons and person_jobs. This approach, even if it looks like an unnecessary step that leads to more complex queries on a first sight, in fact provides much efficient retrieval of data, especially in the case when there is a filter based on the job column. And it could easily solve the problem when some of them decide to get second job.  

    As there is not such thing as universal solution, I would recommend defining all columns as NOT NULL except in the case of:

    - Columns that could have missing information during some period of time (e.g. forms that could be saved temporarily with still missing values, tables that accept imports with not mandatory fields that will be entered afterwards, etc.), which effectively covers "temporarily missing data". This doesn't really violate normalization rules as those rows could be treated as rows still in the process of inserting data, although I would personally prefer to keep incomplete rows in separate table and send them to the main one once data are complete;

    - Columns where NULL always means "do not have a value" or "not applicable" and amount of rows containing nulls does not go over 10% of total number of rows (it's against basic relational principles but this is an acceptable de-normalization step). Note that 10% is not a magic number and it depends on the total number of rows. As number of rows goes up, the percentage of "allowed null rows" should decrease in order to maintain efficiency of the queries.  Having tens of thousands of additional "is null" operators or isnull(), coalesce() or similar functions will always bring unnecessary load to the database server (as well as additional application code);

    - Columns representing hierarchy structure could have null representing top of the hierarchy, although this could cause problem if it could mean also "unknown" because, on example, list of top managers defined as ManagerID is null will contain people that are unassigned at the moment. A solution for it could be a rule: "Top manager has his own id as ManagerID" In that case null is avoided and logic to select top manager(s) is just shifted from  <font face="courier new,courier">Where ManagerID is Null </font>to <font face="courier new,courier">Where ManagerID=EmpID</font>, which is not a big deal.

     


  • :belt_onion:

    Thanks for sharing the article.

    @niik said:

    - Columns where NULL always means "do not have a value" or "not applicable" and amount of rows containing nulls does not go over 10% of total number of rows (it's against basic relational principles but this is an acceptable de-normalization step). Note that 10% is not a magic number and it depends on the total number of rows. As number of rows goes up, the percentage of "allowed null rows" should decrease in order to maintain efficiency of the queries.  Having tens of thousands of additional "is null" operators or isnull(), coalesce() or similar functions will always bring unnecessary load to the database server (as well as additional application code);
    I don't feel too comfortable wih this section. Your queries don't care that only a certain percentage of rows can have columns where NULL could be "not yet entered" or "not applicable". If you allow this for just one row, you'll still have to write the convoluted version of your query.

    And if I hear someone giving advice on how to keep up the performance of something, I always expect to read about test results. It would've been nice if you had run some test cases and actually proved that for a certain amount of records there is indeed a threshold (like the 10% that you mention) where performance begins to drop.

     



  • @niik said:

    If performance is not your concern that's just fine, but you can not generalize based on this starting premise. If you buy Ferrari you can't complain it consumes too much fuel, maintenance fees are high and insurance is killing just because you use it once per week to go to the supermarket. Relational databases are, like it or not, the most efficient systems to store and retrieve data, at least today (excluding some special cases). And there are bunch of them with different prices, characteristics, support options etc. You should choose one that fits you. And there are still other options that could be evaluated if performance is not an issue.
      The criticism isn't of relational databases, but of SQL.  To extend your analogy, I'm not complaining about my Ferrari; it's a beautiful machine.  I'm complaining about the fact that I have to operate it using a trackball.  It's needlessly difficult to control, regardless of how easy it is to understand.  Still, the Ferrari is better than all the other notable cars because they all use the stupid trackball interface.  I suppose I could take a bike, but then it's hard to carry groceries.  I could walk, but that's awfully slow.  At the end of the day, I'm forced to opt for a car, despite the trackball, because the alternatives are worse.  But that doesn't mean I have to like the trackball.  Don't fool yourself: the best tool readily available is not the same as the best tool.

     @niik said:

    The problem with "job done" approach
    I need to stop you right here: I've said nothing about having a "job done" approach.  I've never advocated a strategy of getting the app out the door as quickly as we can.  I'm talking about getting work done.  "Work" here means real progress in converting a need for software into the actual application.  That's not simply coding, either; it's the entire development process.  In fact, a great deal of coding doesn't really qualify as doing work at all.  For example, writing code to deal with SQL's tri-state logic.

    @niik said:

    But this also depends on needs, as I said before. I would not spend my time developing state-of-the art application when it doesn't need to be, and I'll use small city car to go to a shopping mall, not Ferrari.
    You're really wedded to this concept that I care about what car I'm driving, or whether I need a car at all.  At some point I've already decided whether I need a car, and what kind of car I need.  What I'm complaining about is that driving is already a hard task, and while I should be focusing on that truck swerving up ahead, I'm instead trying to get this goddamn trackball to make the car do what I need.



  • @bjolling said:

    I don't feel too comfortable wih this section.

    Neither do I.

    I would like to delete this paragraph immediately, but in most application I've seen exist at least one table with a lot of columns where lot of columns could be null for whatever reason. Keeping all of them nullable was and is not a solution as well as making 50 or more additional tables just to enforce full normalization.

    For every new design I'm using "normalize first - de-normalize later" rule, but opportunities to start database design from scratch are rare unfortunately. Most of the time you have to deal with strong opposition from developers and managers, budget constraints and not enough time to perform all things that are really necessary. Then, you have to make compromise, like it or not. And compromise I suggest is what I said in my previous post.

    Performance related advices are tricky and have to be tested for every new case. Some of them have to be revised again and again for the same database (as data change and grow over time). And, yes, I did testing on select queries where filter criteria was defined over nullable column against the same data divided in two related tables. And I've got a bunch of different results in range of "one table is much better" to "two tables are much better" with all the gray levels in between. It doesn't depend only on percentage of rows with nulls, but on existence of index over that column (often columns don't have indexes because of faster insert/update operations), column statistic (aka index statistic), overall size and number of rows, and what is a primary operation over the table - select, insert or update (including the ratio between them).And, let's say - I had to come with some rule as a starting point - and I did it. I came with some numbers which were not correct but were the nearest approximation of my testing results. I'm not happy with that and if I have opportunity I rethink every of those decisions for introducing non mandatory columns and in some of the cases I'm not really assured I did a good job, whatever decision I made. But, that's life.

    @bjolling said:

    If you allow this for just one row, you'll still have to write the convoluted version of your query.

    Sad, but true. Life is not fair. That's the reason I do not like non mandatory columns. That's te reason why I wrote the article, too.

     



  • @bstorer said:

    to make the car do what I need
     

    Let's finish this discussion. Just explain me what's your proposal how to avoid ternary logic if  null's are allowed? What Should be result of

    true and null

    false or null

    Solution could be to forbid nulls. I'd like that, but even with all mandatory columns you still could end up with null in outer joins.

    Let's forbid outer joins. Huh...

    How other, better than SQL, languages solve that problem?

    Whats happen in any other language when you try to do boolean comparison against variable with no assigned value (or has null if you prefer)?

    Do you want SQL to return you something like: "Execution canceled because attempt of comparison against null."? Of course not.

    You addvocate that null should be treated as NOVALUEASSIGNED. Fine with me.

    And you want to get all the people that have SSN different than '1234'? Great.

    From what group of the people? Just from people with SSN or from all of them?

    If you force boolen logic to have predefined answer to NULL state you don't allow developers to give meaning to NULL - you enforce meaning - one or another.

    And, as I could remember you were asking to have right to choose meaning, but it seems you want to choose meaning and to make all of us accept it as "the only truth".  Amen.

     

     




  • @niik said:

    Just explain me what's your proposal how to avoid ternary logic if  null's are allowed? What Should be result of

    true and null

    false or null

    What's 9 + pizza?  What do you get when you concatenate a string to a man riding a unicycle?  Those questions make as much sense as the ones you ask do in binary logic.  The domain of values in boolean logic is {true, false}.  I don't see null in that set, do you?

    @niik said:
    Solution could be to forbid nulls. I'd like that, but even with all mandatory columns you still could end up with null in outer joins.

    Let's forbid outer joins. Huh...

    In my experience outer joins are frequently a way to do something that is more correctly done with multiple queries, but I'm sure others have found a good use for them.  How about a simpler solution: don't forbid null.  Just treat it as a indicator value, completely separate from any other value that type can hold.  NULL == NULL but nothing else does, just as 'a' == 'a' and nothing else does.

    @niik said:

    How other, better than SQL, languages solve that problem?

    Hugh Darwen has a bold solution (see the handouts and lecture notes for "How to Handle Missing Information Without Using NULL").  It's fairly interesting, but perhaps a bit of overkill (Although distributed key constraints really ought to exist.).  But you gotta admit, that's some normalized data, huh?  I'm not sure how much of this is presently possible in any Tutorial D implementation.

    @niik said:

    Whats happen in any other language when you try to do boolean comparison against variable with no assigned value (or has null if you prefer)?

    Do you want SQL to return you something like: "Execution canceled because attempt of comparison against null."? Of course not.

    'a' == NULL returns false, because 'a' is clearly not null.  I don't see your problem with this.

    @niik said:

    You addvocate that null should be treated as NOVALUEASSIGNED. Fine with me.

    And you want to get all the people that have SSN different than '1234'? Great.

    From what group of the people? Just from people with SSN or from all of them?

      SELECT * FROM people WHERE (ssn <> '1234') returns every row with a value other than '1234', including NULLs.  It's what I asked for, it's what I get.  If I don't want NULLs, I should be able to do something along the lines of SELECT * FROM people WHERE (ssn NOT IN {'1234', NULL}).

    @niik said:

    If you force boolen logic to have predefined answer to NULL state you don't allow developers to give meaning to NULL - you enforce meaning - one or another.
    I'm enforcing no meaning.  I'm leaving it up to the user to ascribe their own meaning to NULL.  I'm just providing an optional symbol that exists within the domain of every type, but outside the value set of that type. Whether you take it to mean unknown or the color blue is completely up to you.



  • @bstorer said:

    @niik said:
    From what group of the people? Just from people with SSN or from all of them?
      SELECT * FROM people WHERE (ssn <> '1234') returns every row with a value other than '1234', including NULLs.  It's what I asked for, it's what I get. 
     

    Huh, why I feel like talking with my son (teenager, age 17, thinks he knows everything)?

    Is it

    SSN<'1234'  UNION  SSN>'1234' 

    same as

    SSN <> '1234' ?

    In what result set rows with NULL belong?

    Is 9 bigger than 'pizza'?

    In fact, we agree about most of the things, at least according to your previous post. NULL is not value and can not be compared.

    Again:

    @bstorer said:

    SELECT * FROM people WHERE (ssn <> '1234') returns every row with a value other than '1234', including NULLs.

    It's what I asked for, it's what I get. 


     It's vice versa, you have to include NULL rows if you want to use not equal, with UNION or with  WHERE ssn is null or ssn<>'1234'

    Of course, it would hellp all of us if there is some more elagant way, something like

    SSN<>'1234'   returns  SSN<'1234'  UNION  SSN>'1234'

    and

    SSN != '1234' returns  SSN<'1234'  UNION  SSN>'1234' UNION SSN is null

    If this is what you prefere to have, I would like to have it too. Maybe could cause confusion in the beginning, but in long terms or with better choosen operators it would be quite useful.

    About usfulness of NULLs in database, read my post 'How to minimize usage of NULL'. It's not complete and is a kind of compromise, but explains my point of view.

    Allowing nulls in database created probably more application problems then C pointers. I would not say pointers are wrong, esspecialy if your primary goal is performance. In the exactly same way I would not say NULL is useless, but it is dangerous and should be avoided as much as possible. And should be treated as NULL.



  • @niik said:

    How other, better than SQL, languages solve that problem?

    That's what I asked and nobody took up the challenge.  No matter what your situation and why, C++, Java, C# or VB or SQL, if your variable might be null you can't use it like you can when it's not null.

     

    @bstorer said:

    'a' == NULL returns false, because 'a' is clearly not null.  I don't see your problem with this.

    Null is not a value as such.  Does a == something that is unknown ?.  I don't and can't know.  I don't see your problem with this.

    Let me put it this way:  Is your age the same as mine?



  • @niik said:

    If this is what you prefere to have, I would like to have it too. Maybe could cause confusion in the beginning, but in long terms or with better choosen operators it would be quite useful.
      I'm not going to address the entirety of your points, because they boil down to one thing: SQL, be it the standard version or mine, needs to be scrapped.  Frankly, we ought to be able to work directly with the database programmatically instead of using an intermediate langauge anyway.

    @niik said:

    About usfulness of NULLs in database, read my post 'How to minimize usage of NULL'. It's not complete and is a kind of compromise, but explains my point of view.
      I read it.  You really should read Darwen's stuff I posted before.  He gives some interesting ideas on how this could work.

     



  •  @LoztInSpace said:



    @bstorer said:
    'a' == NULL returns false, because 'a' is clearly not null.  I don't see your problem with this.


    Null is not a value as such.  Does a == something that is unknown ?.  I don't and can't know.  I don't see your problem with this.


    My problem with this is that null is treated as if it were in a superposition, but with none of the advantages of such an arrangement, but with many of the logical inconsistencies it creates.  Consider, for example, a query like "SELECT * FROM some_table WHERE (a = 0 or a = 1 or a = 2 or ..." wherein I enumerate every possible value for the type of column a.  I won't get back the rows where a is null, despite the fact that they have to be one of those values.  This is an outlandish example, though, so let's consider a simpler one: a = 0 or not a = 0.  According to tri-state logic, this query doesn't give back nulls.  That clause covers the entire domain of a's values.  Those nulls are supposed to be unknown values, not impossible values.

    What I propose is something subtly different: instead null being an unknown value in the domain, it is completely outside the domain.  Null becomes inapplicable to ordering and clauses that cover the domain.  Granted, this isn't a perfect system, and inconsistencies are still possible.  One of the big problems is that there has to be some way to handle ordering with nulls.  There are a few different solutions, none of them ideal.  The real solution, again, is to leave SQL behind and move on to something better.



  •  @bstorer said:

    ...

    I know Hugh Darwen's work, as well as work of Data, Codd a lot of other theoreticians and practitioners of relational model. And I have my oppinion, which doesn't neccessary fits in one frame that fits all (as well as all others).

    The problem with you is, although you red some theory, you don't understand it. You just use it to show "your knowlege".

    I'll try to explain you how it looks like using relational theory (not SQL), once more.

    There is relation "people" that covers name, surname, date of birth - all mandatory fields.

    There is another relation with SSN numbers. It has less tuples because not every person has SSN.

    There's no nulls at all.

    There is relationship established between people and SSNs.

    There's still no nulls.

    Based on those two relations, two additional projections can be created:

    - people with SSN

    - people without SSN

    This could be done in different way too. You can have two relations: people with SSN and people without SSN, and two projections: people and SSNs - it doesn't matter. Projections and relations by relational theory must be interchangable without affecting nor data neither users of data. But this is not important.

    Important is that when you ask "I want list of people with SSN different than '1234' you are addresing relation or projection that contains only and only people with SSN.

    Why? Because tuple with null inside is not a tuple (check Darwen's work, or Codd or Data). You can not expect that people from other relation will appear.

    If you want all of them, you have to make union of two projections - People with SSN different than '1234' and people without SSN.

    There's no nulls.

    There's no ternary logic.

    There's no missinterpretations.

    You can do that in all existing RDBMSes.

    Where is the problem? It is in the fact that you are allowed to use nulls and you use them. And you do that even you are against them. Much worse is that you use them in a completly wrong manner and without understanding basic principles of relational theory.

    Computers are stupid. They do exactly what they are told to do. They can't read mind and they can't read between lines. They can not know there's difference in command "Give me the list of the people with SSN different than '1234'" based on who's asking. You or me. Or sameone else. They answer just following the logic: If you asked me to check people's SSNs, I'll check only relation representing people that has SSN. And that's the only right relation/projection/set to be checked. People without SSN does not belong here. They are "pizzas".Don't mix them with 9s.

    And, as you don't understand theory, your answer is - I need something better. There are flaws in implementations of relational model, especially in implemantations of SQL. But to say

     @bstorer said:

    Frankly, we ought to be able to work directly with the database programmatically instead of using an intermediate langauge anyway.

     just shows once more your deep not understanding of relational theory and Hugh Darwen's work.

    No one should be allowed to work directly with the database programmatically, because in that case that's not database anymore. It's a file with bunch of data inside. If you want to do that way, you can. You just don't need database.

     

     

     

     


  • Discourse touched me in a no-no place

    @niik said:

    Based on those two relations, two additional projections can be created:

    - people with SSN

    - people without SSN

    Along with

     - People with more than more than one SSN, and

     - SSNs used by more than one person

    (Google 078-05-1120)



  • @niik said:

     

    I know Hugh Darwen's work, as well as work of Data, Codd a lot of other theoreticians and practitioners of relational model. And I have my oppinion, which doesn't neccessary fits in one frame that fits all (as well as all others).

    The problem with you is, although you red some theory, you don't understand it. You just use it to show "your knowlege".

    @niik said:

     

    You can do that in all existing RDBMSes.

    I'm well aware of the theory, and I'm well aware that you can do that in existing systems.  None of that excuses the manner in which SQL handles NULLs.  That you don't need to use an aspect of a language doesn't defend said aspect from criticism.  Further, I don't know where you came to the conclusion that my dislike of NULLs and my dislike of SQL are interchangable.  NULLs are only a single complaint of many I have with SQL.  By the way, his name is Date, not Data.

    @niik said:

    Where is the problem? It is in the fact that you are allowed to use nulls and you use them. And you do that even you are against them.
    When did I say that I use them?  I said I don't like them.  You asked me how I'd fix them, I offered two solutions: a middle ground, revampling NULL so that it better dovetails with the way programming languages handle NULL, and one drastic but ideal, specifically moving to a language with no concept of NULL.
    @niik said:
    And, as you don't understand theory, your answer is - I need something better. There are flaws in implementations of relational model, especially in implemantations of SQL.
    So why can't I say we need something better?  We've moved away from ASM, we've moved away from C, we're moving (thank God) away from C++.  Is it so much to suggest that we move away from SQL?@niik said:

    But to say

     @bstorer said:

    Frankly, we ought to be able to work directly with the database programmatically instead of using an intermediate langauge anyway.

     just shows once more your deep not understanding of relational theory and Hugh Darwen's work.

    No one should be allowed to work directly with the database programmatically, because in that case that's not database anymore. It's a file with bunch of data inside. If you want to do that way, you can. You just don't need database.

    There seems to be a misunderstanding on your part.  I'm not advocating the violation of constraints, a lack of transactions, or the removal of privileges.  I'm not proposing ad hoc access to the data with disregard for its structure.  What I mean is that the constructs to manipulate relational data should exist in the programming languages.  Instead of building my query into another language, it should be provided so as to fit the style and structure of the native language.  LINQ is half-way to what I want.  It still outputs SQL to be interpreted by the RDBMS, which is the step I want eliminated.



  • The handling of nulls in databases is the first thing most people are taught. It beggars belief that some so called professional programmers do not understand how to deal with them! I always get the impression that programmers that do not like sql and nulls are straight out of school and have not experienced proper business systems, if you want to earn real money from this career you have to deal with real rdms'es, SQL server,Oracle and the like, null handling is one of the simplest things you are going to have to deal with. SQL might not be perfect but it is loads better than doing the whole data storage, retrieval thing yourself. Its consistent and widely understood by nearly all professional developers, the real wtf is threads like this where people open display their lack of professional skill in front of their piers!

    If you do not like SQL create data access layers, but for gods sake let someone that actually understands databases create them!



  • @Jimmy Savile said:

    The handling of nulls in databases is the first thing most people are taught. It beggars belief that some so called professional programmers do not understand how to deal with them!
    Nobody is complaining that they don't know how to deal with them, but that having to deal with them at all is tedious and potentially error-prone.@Jimmy Savile said:
    I always get the impression that programmers that do not like sql and nulls are straight out of school and have not experienced proper business systems, if you want to earn real money from this career you have to deal with real rdms'es, SQL server,Oracle and the like, null handling is one of the simplest things you are going to have to deal with.
    I always get the impression that programmers who just accept their lot and have no interest in questioning anything are little more than glorified code monkeys.  It boggles my mind that you're so accepting of the numerous flaws that you have just acknowledged.  If I can't rely on you to think critically about your tools, how can I rely on you to think critically about the task at hand?  Yes, I'm pigeonholing, but you started it.@Jimmy Savile said:
    SQL might not be perfect but it is loads better than doing the whole data storage, retrieval thing yourself.
    This is a false dichotomy.  There exists a third alternative: creating something better.  I honestly don't understand what makes you think SQL is sacred.  We've been through a plethora of programming languages in SQL's lifetime, each an attempt to improve upon the past.  Why is it proscribed to suggest that we think about replacing SQL with something better?@Jimmy Savile said:
    Its consistent and widely understood by nearly all professional developers, the real wtf is threads like this where people open display their lack of professional skill in front of their piers!
    This is a moronic statement.  QBASIC is consistent and widely understood, but that doesn't justify its use.  I understand that there is nothing better, but I don't have to like it.@Jimmy Savile said:
    If you do not like SQL create data access layers, but for gods sake let someone that actually understands databases create them!
    I'm sick of being accused of not understanding databases.  If I didn't understand them, I'd be hard-pressed to recognize their flaws, wouldn't I?  You see the problems, but you're too content to change anything.  I'm glad I don't have to work with you; that complacency disgusts me. 



  • SQL is a strange language. I mean, if you've never seen SQL, and only have normal programming experience, and find yourself in front of a significant query (with a few joins, orders, subselects, ins, groups, havings) you cannot flawlessly predict the parse/process sequence of the elements in a significant query. One has to learn by heart that any query is executed in the almost -- but not quite -- opposite order from the syntax. Having it look like human-ish language sentences was a mistake. Before you know the order of every clause, many queries seem ambiguous.

    I've always found SQL cumbersome to use, but strangely delightful as well, as it appealed to my sense data control: you are like a god creating meaningful datasets from raw information. Ahem.

    SQL, be it the standard version or mine, needs to be scrapped.  Frankly, we ought to be able to work directly with the database programmatically instead of using an intermediate langauge anyway.

    Agreed. It seems redundant that I'm writing code that writes code that performs an operation. That middle step can, and should, be eliminated. The same goes for HTML, really. I don't really feel I'm getting things done when I'm building strings of HTML. I should be able to have my business code construct a tree of Things, and have a separate formatter turn it into ML.

    I once made a small half-assed attempt to convert SQL syntax for common queries into a proper program structure, but as it was half-assed and my time was short, and my knowledge of SQL is not big enough (it's a pretty expansive language), it didn't go very far. Same for HTML output. Both attempts are filed away. Perhaps someday, something will come of it.

    ===

    As far as the NULL discussion goes: I'm in the NULL = 'empty' camp, where we can say with absolute certainty that 'a' is not in {'b', null}. The fact that in SQL this is not the case is certainly explainable to new programmers, and usable in many practical circumstances, but that doesn't mean it really makes sense in a broader perspective.

    It's an odd presupposition on the part of its developers, because it adds arbitrary subjectivity to the meaning of NULL. The fact that SQL appears to have no effective constructs to deal with it just adds insult to injury.


Log in to reply