Attempted JOIN on NULL



  • Reading through some production code written before I came to work at my present position, I found this query:

    SELECT p.purchase_id FROM tbl_purchase AS p
    LEFT JOIN tbl_history AS h ON h.purchase_id = p.purchase_id
    LEFT JOIN tbl_item_purchase AS ip ON ip.purchase_id = p.purchase_id
    LEFT JOIN tbl_code AS c ON c.purchase_id = p.purchase_id
    WHERE p.purch_type IS NULL
    AND p.inactive = 1
    AND p.date_time < NOW() - INTERVAL '10 minutes'
    AND h.purchase_id IS NULL
    AND ip.purchase_id IS NULL
    AND c.purchase_id IS NULL

    Even if joining on NULL could somehow work, tbl_purchase.purchase_id is a primary key and the other references are foreign keys. I honestly can't even guess what results the original coder was hoping for.


  • Considered Harmful

    He wants purchases that aren't referenced by those other tables. Could potentially use p.purchase_id NOT IN( [subquery] ) for the same effect.

    A left (outer) join will have NULL for those fields in tables that didn't join anything.



  • You clearly need to learn more database theory and SQL before you post things like that here because that is one perfectly valid query.
    NOT IN or NOT EXISTS might have made it easier to read, but they're sementically nearly identical. There might be some performance differences though.



  • Okay, fine, he was looking for records in tbl_purchase that weren't matched tbl_history, tbl_item_purchase, or tbl_code (apparently, a matching record needs to be present in all 4 tables to be valid). Then what's the reasoning for, effectively, writing this...

    (h.purchase_id=p.purchase_ID AND ip.purchase_id = p.purchase_id AND c.purchase_id = p.purchase_id)
    AND h.purchase_id IS NULL AND ip.purchase_id IS NULL AND c.purchase_id IS NULL

    instead of, say, this...

    (h.purchase_id=p.purchase_ID AND ip.purchase_id = p.purchase_id AND c.purchase_id = p.purchase_id)
    AND p.purchase_id IS NULL

    I mean... if they're all equal (per the LEFT JOIN conditions), and any one of them is NULL, aren't they all NULL, and why do we need to test more than one of them?


  • ♿ (Parody)

    @anotherusername said:

    Then what's the reasoning for, effectively, writing this...

    (h.purchase_id=p.purchase_ID AND ip.purchase_id = p.purchase_id AND c.purchase_id = p.purchase_id)
    AND h.purchase_id IS NULL AND ip.purchase_id IS NULL AND c.purchase_id IS NULL

    instead of, say, this...

    (h.purchase_id=p.purchase_ID AND ip.purchase_id = p.purchase_id AND c.purchase_id = p.purchase_id)
    AND p.purchase_id IS NULL

    I mean... if they're all equal (per the LEFT JOIN conditions), and any one of them is NULL, aren't they all NULL, and why do we need to test more than one of them?

    Think about what a LEFT JOIN means as far as null data. Of course null = null is guaranteed to be false, so your idea that it's "effectively the same" isn't right, either.



  • The POWER of NULL

    @anotherusername said:

    I mean... if they're all equal (per the LEFT JOIN conditions), and any one of them is NULL, aren't they all NULL, and why do we need to test more than one of them?

    Because they don't have to be equal. Null is not equal to anything.

    This looks like cleanup code looking for inactive purchases which don't have any items in ANY of the other tables. If the item has an entry any one of those, then it won't be returned by the query.



  • @anotherusername said:

    I mean... if they're all equal (per the LEFT JOIN conditions), and any one of them is NULL, aren't they all NULL, and why do we need to test more than one of them?
     

    Well, that's SQL. Thus, if they are all NULL, they are not equal.



  • @don said:

    Reading through some production code written before I came to work at my present position, I found this query:

    SELECT p.purchase_id FROM tbl_purchase AS p
    LEFT JOIN tbl_history AS h ON h.purchase_id = p.purchase_id
    LEFT JOIN tbl_item_purchase AS ip ON ip.purchase_id = p.purchase_id
    LEFT JOIN tbl_code AS c ON c.purchase_id = p.purchase_id
    WHERE p.purch_type IS NULL
    AND p.inactive = 1
    AND p.date_time < NOW() - INTERVAL '10 minutes'
    AND h.purchase_id IS NULL
    AND ip.purchase_id IS NULL
    AND c.purchase_id IS NULL

    Even if joining on NULL could somehow work, tbl_purchase.purchase_id is a primary key and the other references are foreign keys. I honestly can't even guess what results the original coder was hoping for.

    Considering that this looks like perfectly valid SQL to find orphans to clean up, I hope you will post some of your own SQL because I think it will be more entertaining for us!



  • Ok, I see it now. LEFT JOIN will select all rows from the left table, including rows where no linked row exists in the right table. For those records, the linked field in the right table will be NULL, and those are the rows that are being selected.

    Still, it seems odd that he's trying to find records that aren't matched in all 3 of the other tables. If there's supposed to be a correspondence between the tables, wouldn't you be interested in records that aren't matched in any one of the other 3 tables? E.g. if it's in 3 tables but not the 4th, isn't that a problem? This will only find rows that are in the first table but none of the other 3.



  • Ho hum. Anti join. Welcome to Introduction to Relational Databases. Don't forget to pick up your textbook.



  • @darkmattar said:

    Considering that this looks like perfectly valid SQL to find orphans to clean up, I hope you will post some of your own SQL because I think it will be more entertaining for us!
     

    And since he joined here in 2005, I guess I has had at least 8 years to create a ton of WTFs!


     


  • ♿ (Parody)

    @db2 said:

    Ho hum. Anti join. Welcome to Introduction to Relational Databases. Don't forget to pick up your textbook.

    <INSERT BLAKEYRANT ABOUT DEVELOPERS AND RDBMS HERE>



  • @anotherusername said:

    Ok, I see it now. LEFT JOIN will select all rows from the left table, including rows where no linked row exists in the right table. For those records, the linked field in the right table will be NULL, and those are the rows that are being selected.

    Still, it seems odd that he's trying to find records that aren't matched in all 3 of the other tables. If there's supposed to be a correspondence between the tables, wouldn't you be interested in records that aren't matched in any one of the other 3 tables? E.g. if it's in 3 tables but not the 4th, isn't that a problem? This will only find rows that are in the first table but none of the other 3.

    This seems reasonable to me, although I'm on my first cup of coffee still and of course, I don't know the database in question.

    WHERE p.purch_type IS NULL

    AND p.inactive = 1

    AND p.date_time < NOW() - INTERVAL '10 minutes'

    AND (h.purchase_id IS NULL OR ip.purchase_id IS NULL OR c.purchase_id IS NULL)

    Really depends on the design. Does an entry have to exist in all 3 tables to be valid, or does it only have to exist in 1 of the other 3. I'd try to guess based on the names, but anyone who names a table "tbl_purchase" (Unless it happens to contain data about what tables have been purchased) might not have the best naming conventions.



  • @cdosrun said:

    @anotherusername said:

    Ok, I see it now. LEFT JOIN will select all rows from the left table, including rows where no linked row exists in the right table. For those records, the linked field in the right table will be NULL, and those are the rows that are being selected.

    Still, it seems odd that he's trying to find records that aren't matched in all 3 of the other tables. If there's supposed to be a correspondence between the tables, wouldn't you be interested in records that aren't matched in any one of the other 3 tables? E.g. if it's in 3 tables but not the 4th, isn't that a problem? This will only find rows that are in the first table but none of the other 3.

    This seems reasonable to me, although I'm on my first cup of coffee still and of course, I don't know the database in question.

    WHERE p.purch_type IS NULL

    AND p.inactive = 1

    AND p.date_time

    AND (h.purchase_id IS NULL OR ip.purchase_id IS NULL OR c.purchase_id IS NULL)

    Really depends on the design. Does an entry have to exist in all 3 tables to be valid, or does it only have to exist in 1 of the other 3. I'd try to guess based on the names, but anyone who names a table "tbl_purchase" (Unless it happens to contain data about what tables have been purchased) might not have the best naming conventions.

    It's probably Access. That naming convention seems common (frm* for forms, tbl* for tables, mcr* for macros, qry* for queries, rpt* for reports).

    At least they're actually tables. The Access tables that I inherited were designed by people who weren't too careful about keeping the prefixes, so I have reports that are named qry*, forms that are named qry*, queries that are named tbl*, reports that are named tbl*, forms that are named tbl*... some are clarified by adding "subreport" or "query" at the end, but of course that isn't consistent either.



  • @anotherusername said:

    The Access tables that I inherited were designed by people who weren't too careful about keeping the prefixes, so I have reports that are named qry*, forms that are named qry*, queries that are named tbl*, reports that are named tbl*, forms that are named tbl*... some are clarified by adding "subreport" or "query" at the end, but of course that isn't consistent either.

    That's called Inconsistent Hungarian Notation, also known as Careless Indian Notation.



  • Oh, believe me, I have tons of my own WTFs. You have none?

    Once you guys put me on to the fact that this is doing something I would have done with NOT EXISTS or NOT IN, I started looking at it more closely and found that, at least in this enviornment, this approach runs significantly faster. Huh. It doesn't just work, it works really well.

    If you think being in a business for eight years means you ought to have nothing left to learn, and never make another mistake, let us know how that attitude is working for you in, say, another eight years.@beginner_ said:

    @darkmattar said:

    Considering that this looks like perfectly valid SQL to find orphans to clean up, I hope you will post some of your own SQL because I think it will be more entertaining for us!
     

    And since he joined here in 2005, I guess I has had at least 8 years to create a ton of WTFs!


     


     



  • @don said:

    Oh, believe me, I have tons of my own WTFs. You have none?

    Once you guys put me on to the fact that this is doing something I would have done with NOT EXISTS or NOT IN, I started looking at it more closely and found that, at least in this enviornment, this approach runs significantly faster. Huh. It doesn't just work, it works really well.

    If you think being in a business for eight years means you ought to have nothing left to learn, and never make another mistake, let us know how that attitude is working for you in, say, another eight years.

    Huh. That's a good response to this. You seem like a semi-stable human being, able and willing to learn.

    What are you doing here?



  • @cdosrun said:

    What are you doing here?
     

    Learning is not incopatible with pointing finger at other people and laughing.

    EDIT: And now that I think of it, it not incompatible with trolling either. But publicaly assuming that you are willing to learn may be.



  • @cdosrun said:

    @don said:

    Oh, believe me, I have tons of my own WTFs. You have none?

    Once you guys put me on to the fact that this is doing something I would have done with NOT EXISTS or NOT IN, I started looking at it more closely and found that, at least in this enviornment, this approach runs significantly faster. Huh. It doesn't just work, it works really well.

    If you think being in a business for eight years means you ought to have nothing left to learn, and never make another mistake, let us know how that attitude is working for you in, say, another eight years.

    Huh. That's a good response to this. You seem like a semi-stable human being, able and willing to learn.

    What are you doing here?

    Seriously, what the hell is his problem? Where's the overreaction at how we dare challenge his expert knowledge of all things relational and database-y.


    Honestly, I probably would have used the NOT EXISTS or NOT IN methods myself. What type of database are you working in, and what size are the tables involved in this?



  • @darkmattar said:

    @cdosrun said:
    @don said:

    Oh, believe me, I have tons of my own WTFs. You have none?

    Once you guys put me on to the fact that this is doing something I would have done with NOT EXISTS or NOT IN, I started looking at it more closely and found that, at least in this enviornment, this approach runs significantly faster. Huh. It doesn't just work, it works really well.

    If you think being in a business for eight years means you ought to have nothing left to learn, and never make another mistake, let us know how that attitude is working for you in, say, another eight years.

    Huh. That's a good response to this. You seem like a semi-stable human being, able and willing to learn.

    What are you doing here?

    Seriously, what the hell is his problem? Where's the overreaction at how we dare challenge his expert knowledge of all things relational and database-y.


    Honestly, I probably would have used the NOT EXISTS or NOT IN methods myself. What type of database are you working in, and what size are the tables involved in this?

    Fool. It's not a real person, it's a clever spam engine patiently building his reputation until Launch Day where he will tell us all about fake passports on sale.



  • @Ronald said:

    Fool. It's not a real person, it's a clever spam engine patiently building his reputation until Launch Day where he will tell us all about fake passports on sale.

    Once you guys put me on to the fact that this is not a real person, it's a clever spam engine patiently building his reputation until Launch Day where he will tell us all about fake passports on sale, I started looking at it more closely and found that, at least in this environment, this approach runs significantly faster. Huh. It doesn't just work, it works really well.


Log in to reply