That'll teach you to assume that tautologies are true!



  • Okay, so a little bit ago I needed to join some tables on a varchar field for which some values start with an ID number and some don't (and before you ask, no, we didn't design this, this gets pulled from client data files). I need to apply a mapping between rows in the tables where this field starts with the same ID, regardless of what else is in the string. Like a good little data monkey, I run a select statement first to make sure my joins are correct and so forth.

    Here's a portion of this statement, slightly anonymized:

    select *
          from Item i
          inner join Glossary g on i.GlossID = g.GlossID
          inner join LastYearItem lyi
                on
                case 
                      when isnumeric(left(g.TransText,1))=1
    ...

    It goes on from there, but that's the important part. Anyway, I run this statement and it does exactly what I want. Now I just change the first line to turn it into an update statement...

    update lyi set lyi.NewItemID = i.ItemID
          from Item i
          inner join Glossary g on i.GlossID = g.GlossID
          inner join LastYearItem lyi
                on
                case 
                      when isnumeric(left(g.TransText,1))=1
    ... 

    Here SQL Server refuses and throws an exception: there's an invalid cast.

    Wait, what? All of the casts were run in the joins, which are completely unchanged from the previous statement. Why is it now broken?

    After looking at the values it was complaining about, I realized something unusual: It was now running through every entry in Glossary. So, just for the sake of trying something out, I add this:

    update lyi set lyi.NewItemID = i.ItemID
          from Item i
          inner join Glossary g on i.GlossID = g.GlossID
          inner join LastYearItem lyi
                on
                case 
                      when isnumeric(left(g.TransText,1))=1 <font color="red">and i.GlossID = g.GlossID</font>
    ... 

    This works absolutely fine.

    I'm not entirely sure how SQL Server implements inner joins, or why it does so differently between select and update statements, but I guess that'll teach me for assuming i.GlossID is equal to g.GlossID, especially after inner-joining on i.GlossID being equal to g.GlossID.



  • Wow. Try to run explain on the update with and without the <font color="red">i.GlossID = g.GlossID</font>

     
     


  • BINNED

    Is one (but not both) of the GlossID columns numeric?



  • @PedanticCurmudgeon said:

    Is one (but not both) of the GlossID columns numeric?

    Yeah it's doing an implicit conversion on one of the joined columns. You can get away with fuzziness like that with SELECTs but when you need to do UPDATES, SQL's gonna bitch. That's normal, if nothing you'd ever see on a normal database schema.



  • @PedanticCurmudgeon said:

    Is one (but not both) of the GlossID columns numeric?

    I just double-checked, they're definitely both ints.



  • @curtmack said:

    I'm not entirely sure how SQL Server implements inner joins, or why it does so differently between select and update statements, but I guess that'll teach me for assuming i.GlossID is equal to g.GlossID, especially after inner-joining on i.GlossID being equal to g.GlossID.

    Filed under: a cookie for anyone who knows why it does this
    You have a faulty assumption. Just because you put the second join after the first doesn't mean it'll run second. I imagine if you try to get the query engine to EXPLAIN, it'll show that it's joining g to lyi first (probably something to do with indexes or lyi.COUNT() < i.COUNT()), and only by duplicating the join condition does that force it to run that first.


  • ♿ (Parody)

    @TwelveBaud said:

    You have a faulty assumption. Just because you put the second join after the first doesn't mean it'll run second. I imagine if you try to get the query engine to EXPLAIN, it'll show that it's joining g to lyi first (probably something to do with indexes or lyi.COUNT() < i.COUNT()), and only by duplicating the join condition does that force it to run that first.

    I had a similar thought, but then I don't use SQL Server, so the UPDATE...FROM construct was weird enough already. But isn't TRWTF that the SELECT does an implicit cast at all? Why there and not other places? And isn't that the OP's real question?



  • @TwelveBaud said:

    @curtmack said:
    I'm not entirely sure how SQL Server implements inner joins, or why it does so differently between select and update statements, but I guess that'll teach me for assuming i.GlossID is equal to g.GlossID, especially after inner-joining on i.GlossID being equal to g.GlossID.

    Filed under: a cookie for anyone who knows why it does this
    You have a faulty assumption. Just because you put the second join after the first doesn't mean it'll run second. I imagine if you try to get the query engine to EXPLAIN, it'll show that it's joining g to lyi first (probably something to do with indexes or lyi.COUNT() < i.COUNT()), and only by duplicating the join condition does that force it to run that first.

    Yeah, this is almost certainly it. You can also throw OPTION (FORCE ORDER) at the end of your UPDATE query.



  • @db2 said:

    @TwelveBaud said:
    @curtmack said:
    I'm not entirely sure how SQL Server implements inner joins, or why it does so differently between select and update statements, but I guess that'll teach me for assuming i.GlossID is equal to g.GlossID, especially after inner-joining on i.GlossID being equal to g.GlossID.

    Filed under: a cookie for anyone who knows why it does this
    You have a faulty assumption. Just because you put the second join after the first doesn't mean it'll run second. I imagine if you try to get the query engine to EXPLAIN, it'll show that it's joining g to lyi first (probably something to do with indexes or lyi.COUNT(*) < i.COUNT(*)), and only by duplicating the join condition does that force it to run that first.

    Yeah, this is almost certainly it. You can also throw OPTION (FORCE ORDER) at the end of your UPDATE query.

    That makes sense then. Cookies for everyone!



  • @blakeyrat said:

    @PedanticCurmudgeon said:
    Is one (but not both) of the GlossID columns numeric?

    Yeah it's doing an implicit conversion on one of the joined columns. You can get away with fuzziness like that with SELECTs but when you need to do UPDATES, SQL's gonna bitch. That's normal, if nothing you'd ever see on a normal database schema.

     

    And... Did you notice something strange in your statement? Like a programming language that doesn't complain about type mismach just because there is no imediate harm (don't mind looking at the long term - only losers think one statement ahead).

     WTF!



  • @Mcoder said:

    And... Did you notice something strange in your statement? Like a programming language that doesn't complain about type mismach just because there is no imediate harm (don't mind looking at the long term - only losers think one statement ahead).

    WTF!

    It's SQL. It's a very very old language. I'm sure it would be possible to build a significantly better "structured query language" if you built one today, using the experience of all we know from 30 years of using SQL, but nobody's really taken that on except the NoSQL guys, and I'm not all that impressed by their efforts. (MongoDB queries are written as JSON objects, to give you an idea.)



  • @blakeyrat said:

    @Mcoder said:
    And... Did you notice something strange in your statement? Like a programming language that doesn't complain about type mismach just because there is no imediate harm (don't mind looking at the long term - only losers think one statement ahead).

    WTF!

    It's SQL. It's a very very old language. I'm sure it would be possible to build a significantly better "structured query language" if you built one today, using the experience of all we know from 30 years of using SQL, but nobody's really taken that on except the NoSQL guys, and I'm not all that impressed by their efforts. (MongoDB queries are written as JSON objects, to give you an idea.)

     

    SQL is not "very old". At least, it is not old enough for people to not know about types and the advantages of building a consistent semantic model. It's newer than nearly all the old current languages like C, Fortran, Pascal, Lisp etc.

    Anyway, SQL does not have that semantic inconsistency. It is an exclusive feature of the Microsoft SQL Server. Any other DBMS would treat both expressions with the same rigor, raising an error (Oracle, Postgres, DB2) or not (MySQL) at both ocasions.

     



  • Okay this is crazy, but what if isnumeric doesn't return a number and the second query is interpreted as isnumeric(left(g.TransText,1))=((1 and i.GlossID) = g.GlossID)


  • Considered Harmful

    @Ben L. said:

    Okay this is crazy, but what if isnumeric doesn't return a number and the second query is interpreted as isnumeric(left(g.TransText,1))=((1 and i.GlossID) = g.GlossID)

    Actually booleans can't be directly represented in T-SQL, you have to use bit (which have possible values of 0 and 1); boolean expressions like "1=1" don't directly yield a value that can be used for comparison, but you can convert them over to a bit (eg "case when [condition] then 1 else 0 end").



  • @boomzilla said:

    @TwelveBaud said:
    You have a faulty assumption. Just because you put the second join after the first doesn't mean it'll run second. I imagine if you try to get the query engine to EXPLAIN, it'll show that it's joining g to lyi first (probably something to do with indexes or lyi.COUNT() < i.COUNT()), and only by duplicating the join condition does that force it to run that first.

    I had a similar thought, but then I don't use SQL Server, so the UPDATE...FROM construct was weird enough already. But isn't TRWTF that the SELECT does an implicit cast at all? Why there and not other places? And isn't that the OP's real question?

    I thought UPDATE...FROM was actually one of the places where SQL Server followed the ANSI SQL standard for updating in a join. Your post prompted me to go look that up and discover that the standard doesn't even address updating a join. I learned a thing. Thank you.


  • Considered Harmful

    @kilroo said:

    Your post prompted me to go look that up and discover that the standard doesn't even address updating a join.

    You could try:

    update example set foo = 'bar' where exampleid in (
    select exampleid from bar inner join baz on qux=quux -- etc
    )
    

    Provided all the updates affect the same table.



  • @Mcoder said:

    Anyway, SQL does not have that semantic inconsistency. It is an exclusive feature of the Microsoft SQL Server. Any other DBMS would treat both expressions with the same rigor, raising an error (Oracle, Postgres, DB2) or not (MySQL) at both ocasions.


    Yeah, SQL Server's type system is a bit "special" in many ways (y'know, the kind of special that needs you to speak to it slowly and calmly). There are conversions that work in one direction and not the other, there are some things you can do to an "image" column, but not a "varbinary" column, it has no fucking unsigned integer types, WTF IS WITH THAT, multiple date/time types, like "DateTime" (poor precision and a screwie epoch) and "DateTime2" (the one that works properly but now can't be easily joined to an existing DateTime column).


    My pet hate is that SQL Server seems to make basic collations far more complicated than they need to be. It has two basically identical collations, one called "Latin1_General_CI_AS" and another called "SQL_Latin1_General_CP1_CI_AS", which are subtly incompatible for certain operations, which isn't immediately apparent since most things work, except when you try to join on them, or try to do a concat() operation on them. For reasons I can't explain, these two collations seem to be the "default" on various installations of SQL server about equally, so when you end up with mixed collations (cause someone blindly copy-pasted the create script produced by their copy of management studio into the app's sql script) things randomly fail... sometimes... on some computers...


    I guess all sufficiently mature DBMS systems seem to develop the same problem. I'll never forget the first time I was mocked for using a varchar column in an oracle system, then complaining of problems. "I mean, everyone knows you want varchar2, varchar is basically broken." Then why not rename it to "varchar_shit" or "varchar_legacy" or something to indicate to new users that you shouldn't use it. just like "DateTime" in sql server should be renamed now that they've made a defuckifyed version of it. .


    As a complete aside, I don't think even Blakeyrat would attempt to defend the SQL server installer. Every version they release makes it harder and harder to use. I'm sure it's a running gag over at MS. The thing takes forever to run, and every single time, it introduces some new gotcha causes the installer to run for 3 hours and at the end of it either accomplish absolutely nothing, or leave you with an unusable install. For example, try running the "management studio 2012" installer after having already installed SQL server and just nexting through it. Hint: It won't install management studio.



  • @caffiend said:

    As a complete aside, I don't think even Blakeyrat would attempt to defend the SQL server installer.

    Oh God no. It's by far the worst installation experience I've ever had. I don't understand how the company which makes the Office 2007+ installer and the Zune installer could somehow also make the SQL Server installer-- like doesn't one of the Office guys just walk over there and say, "guys. Just stop. Here. Use this. Seriously. Please."



  •  It's the software equivilent of gullwing doors.



  • @blakeyrat said:

    It's by far the worst installation experience I've ever had.
    Fun bug I just discovered (after trying to find out why the SQL Server installer was crashing for several hours): if your OS language doesn't match SQL Server installer language, the installation will ... crash. With a simple "Tell Microsoft about this problem" dialog. Also, trying to do a repair of a failed install of Full-text search tells you that you can't repair features that weren't installed successfully. I wish I was making this up.



  • The SQL Server installer is brilliant compared to the old Reporting Services 2000 installation and configuration process. Spoilers: there practically wasn't one. Also, it won't install if the server has ASP.Net 2.0 installed. Good times.


Log in to reply