Representative ... Query



  • Although I provide support and upkeep for an unwieldymass of legacy ASP Classic code, most of it isn't TDWTF material. Recently, though, another suite of these apps got added to my support "empire" and ... well, yeah. I may have my work cut out for me on these. One in particular is some sort of middle-layer data processing system. I'm not entirely clear on its purpose to be honest, but it is apparently very important. It's also very slow. Over the past couple years, there have been hundreds of trouble tickets entered for this system's performance. Maybe thousands. All of which are still open, but flagged Sev 10 (Informational Only), in accordance with a dictum handed down by my predecessor's predecessor's predecessor. But now? Now things have changed, and efficiency is the buzzword.

    This bit of code, assembling a monster of a SQL query, is not the only thing wrong with this code, but it sure doesn't help. And, no, it's not anonymized. I'm pretty sure these table and field names aren't revealing any secrets...

    Query = "SELECT M.MID, MT, S, SF, SA,"
    Query = Query & "SUM(CASE WHEN (ST = 2 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END),"
    Query = Query & "SUM(CASE WHEN (ST = 11 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END),"
    Query = Query & "SUM(CASE WHEN (ST = 12 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END), "
    Query = Query & "FROM dbo.MM AS M LEFT JOIN dbo.MSL AS S ON M.MID = S.MID "
    Query = Query & "WHERE ((RT = 0 AND R = " & AU(0,0) & ") "
    Query = Query & "OR (RT = 2 AND R IN (" & AU(4,0) & "," & AU(7,0) & "," & AU(8,0) & "))) "
    Query = Query & "GROUP BY M.MID, MT, S, SF, SA "
    Query = Query & "HAVING SUM(CASE WHEN (ST = 3 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END) = 0 "
    Query = Query & "ORDER BY SA DESC, M.MID DESC"

    There are on the order of a half million rows in dbo.MM, and over 3 million in MSL. To its credit, this query does actually process and return data, eventually. Obviously, this whole thing is going to need rewritten, just as soon as I figure out ... what it's doing in the first place.



  • @Serpentes said:

    Query = Query & "HAVING SUM(CASE WHEN (ST = 3 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END) = 0 "

    This is demented. WHO COULD WRITE THIS?



  • I wish I knew who to blame for this monstrosity. The whole thing is over 17,000 lines of code, and that representative query is not the only abuse of SQL. Indeed, I think this entire project may have been some cruel joke regarding uses of CASE.

    Here, for example, is ... well, let's just call this a novel sorting order (and, no, those magic character codes aren't documented). At least this returns in a sort of sane amount of time, unlike that first one...

    Query = "SELECT G1.GID, G1.GN, G1.GCC "
    Query = Query & "FROM dbo.EG AS G1 LEFT JOIN dbo.EG AS G2 ON G1.GPI = G2.GID "
    Query = Query & "WHERE (G1.GID = " & AU(4,0) & "OR G1.GPI = " & AU(4,0) & " OR G2.GPI = " & AU(4,0) & " "
    Query = Query & "AND (G1.GCC <> 'X') "
    Query = Query & "ORDER BY CASE G1.GPI WHEN 1 THEN G1.GID ELSE G1.GPI END, CASE G1.GCC WHEN 'P' THEN 0 ELSE 1 END, G1.GN"

    The frequent use of AU, which in other contexts means "astronomical unit", the distance from the earth to the sun, makes me suspect this is the work of particularly bored and sadistic aliens. Surely, no human could have done this, right? Right?



  • The horrible column/table names and string concatenated SQL are bad.  But, I'm not sure how fast that query can ever be.  It's asking for "parents without children that have ST=3 and SB = {something}".  I don't think that query can ever be satisfied without denormalization or scanning the entire child table.  The where clause might help narrow the search, but based on the fact that it's slow now, it's not likely.  An index on RT might help.



  • In situtations like this I usually recommend arson.

     



  • SUM(CASE WHEN some_condition THEN 1 ELSE 0 END)

    I did this a lot in the past. According to what I have heard, this is called "crosstab" or "pivot table". The idea is to turn rows into columns and columns into rows. However, I would never put a SUM CASE in a HAVING clause.

    I know that the performance of this type of query is really bad, so I usually create a cron job that runs after midnight and dumps the results of this query into another table, so the application can retrieve the data in the needed format with a simple seq scan.

    P.S.: I know my English is really bad - I'm ashamed for posting this comment. Sorry for that.



  • I've seen SUM CASE used as a kludgy pivot table before, but this use seems way weirder to me, not that I've really determined what this monolithic code does. It's still very much in use, but it's black box middleware and no one seems to be able to tell me what function it serves.

    But I've got to come up with a better solution that an overnight cron job. I may not know what this does, but I have found (in ancient yet unclosed trouble tickets) how often it's supposed to run. In the original design specifications, such as they likely were, this was supposed to be run by each of 30 or 40 end-users, once every 15 minutes or so.

    Needless to say, that's not happening.



  • @Serpentes said:

    Query = "SELECT M.MID, MT, S, SF, SA,"
    Query = Query & "SUM(CASE WHEN (ST = 2 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END),"
    Query = Query & "SUM(CASE WHEN (ST = 11 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END),"
    Query = Query & "SUM(CASE WHEN (ST = 12 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END), "
    Query = Query & "FROM dbo.MM AS M LEFT JOIN dbo.MSL AS S ON M.MID = S.MID "
    Query = Query & "WHERE ((RT = 0 AND R = " & AU(0,0) & ") "
    Query = Query & "OR (RT = 2 AND R IN (" & AU(4,0) & "," & AU(7,0) & "," & AU(8,0) & "))) "
    Query = Query & "GROUP BY M.MID, MT, S, SF, SA "
    Query = Query & "HAVING SUM(CASE WHEN (ST = 3 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END) = 0 "
    Query = Query & "ORDER BY SA DESC, M.MID DESC"
    Get each cluster of values in the M ID, MT, S, SF, and SA columns, as well as counts of the records in each cluster where SB is -- assuming AU means collect information from some sort of spreadsheet -- the value of the upper-left cell and ST is 2, 11, or 12 respectively, provided (RT is zero and R has the value of the upper-left cell) or (RT is 2 and R is the value of the fifth, eighth, or ninth cells), and finally after all that exclude any groupings that have any records where ST is 3 and SB is the value of the upper-left cell. Oh, and sort by largest value of SA and largest MID after that.

    And no, I have no idea what that means. At all.



  •  @TwelveBaud said:

    @Serpentes said:
    Query = "SELECT M.MID, MT, S, SF, SA,"
    Query = Query & "SUM(CASE WHEN (ST = 2 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END),"
    Query = Query & "SUM(CASE WHEN (ST = 11 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END),"
    Query = Query & "SUM(CASE WHEN (ST = 12 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END), "
    Query = Query & "FROM dbo.MM AS M LEFT JOIN dbo.MSL AS S ON M.MID = S.MID "
    Query = Query & "WHERE ((RT = 0 AND R = " & AU(0,0) & ") "
    Query = Query & "OR (RT = 2 AND R IN (" & AU(4,0) & "," & AU(7,0) & "," & AU(8,0) & "))) "
    Query = Query & "GROUP BY M.MID, MT, S, SF, SA "
    Query = Query & "HAVING SUM(CASE WHEN (ST = 3 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END) = 0 "
    Query = Query & "ORDER BY SA DESC, M.MID DESC"
    Get each cluster of values in the M ID, MT, S, SF, and SA columns, as well as counts of the records in each cluster where SB is -- assuming AU means collect information from some sort of spreadsheet -- the value of the upper-left cell and ST is 2, 11, or 12 respectively, provided (RT is zero and R has the value of the upper-left cell) or (RT is 2 and R is the value of the fifth, eighth, or ninth cells), and finally after all that exclude any groupings that have any records where ST is 3 and SB is the value of the upper-left cell. Oh, and sort by largest value of SA and largest MID after that.

    And no, I have no idea what that means. At all.

    I wish this was collecting data from some spreadsheet, because then I could find out where that spreadsheet is from and beat someone there. But, no, AU is an array resulting from another incomprehensible piece of concatenated SQL. That one at least avoids HAVING SUM CASE.  Or any CASE, actually. But that doesn't really make it any more approachable. And only the first row of AU is ever referenced anywhere, so either that query is designed to only return one row -- in which case, why is it dropped to an array with GetRows()? -- or there are input conditions that make the wheels fall off this whole project.

    I know which one I suspect. I've spent all day with this codebase and I'm still no closer to understanding what any of it does. The rest of the code meets the same high standard as this SQL; with the notable exception of "Query", I don't think any variable name is more than 3 letters long ... orcomprehensible. I'd accuse this of being machine-generated, but any machine malicious enough to generate this would have already rendered the Terminator franchise nonfictional.

    I used to laugh at obfuscated code contests. They're suddenly not funny anymore.



  • Which table are R and RT in? ST and SB?

    I suspect that the answers are MM and MSL respectively. In which case you might get better performance by restructuring the query to do separate sub-selects and then combining.

    Something like
    SELECT ...
      FROM
      (SELECT MID, MT from dbo.MM
         WHERE ((RT = 0 AND R = AU(0,0))
                OR  (RT = 2 AND R IN (AU(4,0),AU(7,0),AU(8,0)))
         AS M)
      LEFT JOIN 
      (SELECT MID, S, SF , SA from dsb.MSL
        GROUP BY MID, S, SF, SL
        HAVING SUM(...)
        AS S)
      ON M.MID = S.MID
     ORDER BY SA DESC, MID DESC
    
    (This also assumes that MT comes from MM and the other columns from MSL).

    The query optimizer might be doing this anyway.



  • @blakeyrat said:

    @Serpentes said:
    Query = Query & "HAVING SUM(CASE WHEN (ST = 3 AND SB = " & AU(0,0) & ") THEN 1 ELSE 0 END) = 0 "

    This is demented. WHO COULD WRITE THIS?

    I know many people that would write this if they only knew about HAVING clauses, but in this case, it was someone who didn't know any other way to express that he didn't want ST = 3 AND SB = " & AU(0,0) & ", because that's what its' effectively excluding. A subquery would indeed be easier to read and easier to tune.



  • @Fjp said:

    Which table are R and RT in? ST and SB?

    I suspect that the answers are MM and MSL respectively. In which case you might get better performance by restructuring the query to do separate sub-selects and then combining.

    Something like
    SELECT ...
      FROM
      (SELECT MID, MT from dbo.MM
         WHERE ((RT = 0 AND R = AU(0,0))
                OR  (RT = 2 AND R IN (AU(4,0),AU(7,0),AU(8,0)))
         AS M)
      LEFT JOIN 
      (SELECT MID, S, SF , SA from dsb.MSL
        GROUP BY MID, S, SF, SL
        HAVING SUM(...)
        AS S)
      ON M.MID = S.MID
     ORDER BY SA DESC, MID DESC
    
    (This also assumes that MT comes from MM and the other columns from MSL).

    The query optimizer might be doing this anyway.

    Typically, on MS SQL, I have seen doing it that way as a way to really slow down the query.  I move them into the select statement as case statements just like this in order to speed up the query.  Of course, I look at my execution plans to make sure it is speeding it up.

     The only things I can think of to optimize this particular query and gurantee the same results is indexing, or try adding/moving some of the where clause into the join statement.  That can help sometimes depending on your data.



  • So, for the curious ... ST and SB are in MSL.  Everything else is in MM.

    This entire system is some sort of bastardized message triggering engine. It loads data from an external source, munges it repeatedly, and then creates triggers that are picked up by another program. However, somewhere along the way, the "target" program switched from a third party utility to something written in-house by the same guy who wrote this query, with predictably train-wreck changes as a result. For example, this HAVING SUM CASE bit is in here because some triggers are now intended to be viewed by multiple people.  R and RT combine to identify individual employees, or groups of employees, depending on some magic numbers. ST = 3 is a magic number that (I think) represents that the trigger has been cleared or deleted or whatever, so should never re-appear ... but it's handled per-viewer because no matter how many people see a message, there's just the one copy of it.

    I don't think this is even the worst WTF in this system, and I've only scratched the surface. Counting the in-house "target" program, there's probably 50,000 lines of code involved here.


Log in to reply