Maximum redundancy



  • I was reviewing stored procedures today for an application my team is building when I saw one that didn't match our naming convention. "Hmm, probably another mistake from our junior programmer." Rather than just fix the name I decided to see what she'd actually written.

    CREATE PROCEDURE [dbo].[visitor_get_maxid]
    AS
    BEGIN
    select id
    from visitor
    where id in (select top 1 id from visitor order by id desc)
    END



  •  I'll reply Jeopardy style.

    "Why do we always keep a close eye on the junior programmer?"



  • Who are pedophiles, Alex?



  • Is she hot?



  • @sorpigal said:

    I was reviewing stored procedures today for an application my team is building when I saw one that didn't match our naming convention. "Hmm, probably another mistake from our junior programmer." Rather than just fix the name I decided to see what she'd actually written.

    CREATE PROCEDURE [dbo].[visitor_get_maxid]
    AS
    BEGIN
    select id
    from visitor
    where id in (select top 1 id from visitor order by id desc)
    END

    Wow. Is there a slower way to write that query?

    And of course the real WTF is that I'm sure they were going to use that retrieved ID, add one to it, and insert a new record... which would mean the column isn't set up to autonumber either. I can't think of any reason to have a query like that.



  • @DOA said:

     

     I'll reply Jeopardy style.

    "Why do we always keep a close eye on the junior programmer?"

    It's an unfortunate situation. When I have time I check up on her, which leads to one part amazement one part amusement and eight parts getting drunk. Somewhere in there I try to educate, too, but after three years I've more or less given up on further improvement.

    @dhromed said:

    Is she hot?

    Sadly, no, and she's twice my age, too.

    @blakeyrat said:

    Wow. Is there a slower way to write that query?

    And of course the real WTF is that I'm sure they were going to use that retrieved ID, add one to it, and insert a new record... which would mean the column isn't set up to autonumber either. I can't think of any reason to have a query like that.

    It's not as bad as all that since the table (and entire database) was mine originally, so I know it's not completely broken (or if it is it's my fault). This is, sadly, not the craziest SQL I've seen from this person, just the most obviously incorrect.



  • @sorpigal said:

    @blakeyrat said:
    And of course the real WTF is that I'm sure they were going to use that
    retrieved ID, add one to it, and insert a new record... which would mean the
    column isn't set up to autonumber either. I can't think of any reason to have a
    query like that.
    It's not as bad as all that since the table (and entire database) was mine
    originally, so I know it's not completely broken (or if it is it's my fault).
    Incorrect in that 'select max(id) from visitor;' is not what was required where the function was used (re: your comment about function names)? If so, you're not giving us a lot of context to divine how it is wrong.



    If not, then while it's not exactly the most speedy of queries(understatement), it does do what one would expect.



    (btw - I take it no profiling is being done on this DB?)



  • @PJH said:

    @sorpigal said:
    @blakeyrat said:
    And of course the real WTF is that I'm sure they were going to use that
    retrieved ID, add one to it, and insert a new record... which would mean the
    column isn't set up to autonumber either. I can't think of any reason to have a
    query like that.
    It's not as bad as all that since the table (and entire database) was mine
    originally, so I know it's not completely broken (or if it is it's my fault).
    Incorrect in that 'select max(id) from visitor;' is not what was required where the function was used (re: your comment about function names)? If so, you're not giving us a lot of context to divine how it is wrong.



    If not, then while it's not exactly the most speedy of queries(understatement), it does do what one would expect.



    (btw - I take it no profiling is being done on this DB?)
     

    While this is a WTF, I want to know why the max ID is needed to determine if this is just a symptom of the real WTF.

    If this is a primary key, she'd better not be querying it to insert the next higher value, because this column would certainly be an IDENTITY column or something similar.



  • @sorpigal said:

    CREATE PROCEDURE [dbo].[visitor_get_maxid]
    AS
    BEGIN
    select distinct id
    from visitor
    where id in (select top 1 id from visitor order by id desc)
    END

     

    FTFY.



  • @fatbull said:

    @sorpigal said:

    CREATE PROCEDURE [dbo].[visitor_get_maxid]
    AS
    BEGIN
    select distinct id
    from visitor
    where id in (select top 1 id from visitor order by id desc) AND id IS NOT NULL
    END

     

    FTFY.

     

    FTFY II



  • @blakeyrat said:

    Wow. Is there a slower way to write that query?

    Of course there is!

    Create a local variable for max_id, open a cursor for "select * from visitor", loop through it comparing the current record's id to max_id and update max_id if required, return max_id.

    I was going to write this up in PL/SQL but I couldn't be bothered.

    In defense of the "get max id and add 1" process, I use it all the time in our data warehouse. But:

    • This is in Oracle so there are no autonumber columns; granted I could set up a sequence and trigger for every table I want to populate, but that's pretty cumbersome.
    • The only process that updates the tables is under my control, and runs once daily.
    • Where multiple tasks insert data into the same table, the task scheduler ensures that each task must run to completion before the next task in the group is allowed to start.

    The second and third points ensure that "get max id and add 1" doesn't cause conflicts, and then the first point makes it the preferable option. (Also contributing to this is the fact that when this data warehouse was initially being configured - shortly before I joined the team - we started by modifying the pre-packaged routines, and that was how they did it.)

    Obviously, it's not a suitable option for any circumstance in which two processes may be accessing the same table at the same time. Of course, that's [b]not[/b] obvious to a lot of people which is why this technique is so often condemned. But I'm just trying to make the point that if it's locked down hard enough, it's not really a problem.



  • @Scarlet Manuka said:

    But I'm just trying to make the point that if it's locked down hard enough, it's not really a problem [b]yet[/b].

    FTFY.

    Yes, Oracle sequences suck ass, and yes, right now the process may be under your control, but you can't guarantee how that will be next year or next month or this afternoon at 16.30. It may work now, but it's a bit like saying "I don't wear a seatbelt because I only (insert excuse here)".



  • @b_redeker said:

    Yes, Oracle sequences suck ass, and yes, right now the process may be under your control, but you can't guarantee how that will be next year or next month or this afternoon at 16.30. It may work now, but it's a bit like saying "I don't wear a seatbelt because I only (insert excuse here)".

    It's a data warehouse. Allowing the end users to modify data in it would be silly. The data warehouse updates itself from the source database and nobody else touches it. Otherwise, what's the point?



  • @blakeyrat said:

    @sorpigal said:

    I was reviewing stored procedures today for an application my team is building when I saw one that didn't match our naming convention. "Hmm, probably another mistake from our junior programmer." Rather than just fix the name I decided to see what she'd actually written.

    CREATE PROCEDURE [dbo].[visitor_get_maxid]
    AS
    BEGIN
    select id
    from visitor
    where id in (select top 1 id from visitor order by id desc)
    END

    Wow. Is there a slower way to write that query?
    @PJH said:
    If not, then while it's not exactly the most speedy of queries(understatement)
    I don't know about any other DBMS, but MSSQL runs "SELECT TOP 1 id FROM visitor ORDER BY id DESC" exactly as fast as it runs "SELECT MAX(id) FROM visitor".  It uses the same index and reads the same number of pages.  The outer query is stupid, but it only reads the page already in memory and can use the same index.  I would force the developer to rewrite the sp, but for readability, not performance reasons.  The complete lack of formatting bothers me much more than any performance implications.


  • @Jaime said:

    @PJH said:
    If not, then while it's not exactly the most speedy of queries(understatement)
    I don't know about any other DBMS, but MSSQL runs "SELECT TOP 1 id FROM visitor ORDER BY id DESC" exactly as fast as it runs "SELECT MAX(id) FROM visitor".
    Quite possibly, but that has bugger all to do with the query I was talking about.



    cf:


    1. select id from visitor where id in (select top i from visitor order by id desc)


    2. select max(id) from visitor


    3. select top 1 id from visitor order by desc



      You claim that (2) and (3) are the same in MSSQL (probably is), while implying that (2) is the same as (1) (which is what I was comparing which, while some DBMS may optimise (1) down to (2) or (3), they certainly aren't required to.)


  • @PJH said:

    @Jaime said:
    @PJH said:
    If not, then while it's not exactly the most speedy of queries(understatement)
    I don't know about any other DBMS, but MSSQL runs "SELECT TOP 1 id FROM visitor ORDER BY id DESC" exactly as fast as it runs "SELECT MAX(id) FROM visitor".
    Quite possibly, but that has bugger all to do with the query I was talking about.

    cf:

    1) select id from visitor where id in (select top i from visitor order by id desc)

    2) select max(id) from visitor

    3) select top 1 id from visitor order by desc

    You claim that (2) and (3) are the same in MSSQL (probably is), while implying that (2) is the same as (1) (which is what I was comparing which, while some DBMS may optimise (1) down to (2) or (3), they certainly aren't required to.)

    I tweaked (1) to have table names and columns names from one of my large tables and ran it.  It produced a whopping 4 logical reads.  The first two reads were a leaf level and a non-leaf level index page for the inner query (3).  It takes two reads whether written as (2) or (3).  The other two reads were for the outer query.  These two pages will always be the same two as the inner query, so will always be in cache.  Any DBMS that performs the entire inner query and discards all but the first row needs to be taken out and shot.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.