Computed columns vs. Joins: CPU loses.


  • Garbage Person

    So I'm sitting here, trying to figure out some hella performance issues.

    Apparently SQL Server 2005 is GOD DAMNED RETARDED when it comes to joining a view which contains a function output against a table and does a naive join with no predicates regardless of what you tell it to do. End result, BILLIONS OF ROWS!

     

    However, it actually WORKS in the simplified case, because the query planner figures it out. 

     

    But as soon as you join those two together, the world fucking explodes and it does a naive join. I get the feeling it's going to be a long weekend as I recode everything in the world to make that column static.

     

    Works fine:

    SELECT rootParent , tbl2.categoryID FROM v_ContestParents AS tbl INNER JOIN tbl_Contests AS tbl2 on  dbo.func_GetRootParent(tbl.categoryID) = tbl2.categoryID

    Produces an utterly insane pile of rows that it absolutely should not:

    SELECT tbl.rootParent , tbl2.categoryID FROM v_ContestParents AS tbl INNER JOIN tbl_Contests AS tbl2 on tbl.rootParent = tbl2.categoryID

     

    v_ContestParents is defined as:

    CREATE VIEW [dbo].[v_contestParents]
    AS
    SELECT     child.categoryID, child.displayID, child.displayDescr, parent1.categoryID AS parent1CategoryID, parent1.displayID AS parent1DisplayID,
                          parent1.displayDescr AS parent1DisplayDescr, parent2.categoryID AS parent2CategoryID, parent2.displayID AS parent2DisplayID,
                          parent2.displayDescr AS parent2DisplayDescr, parent3.categoryID AS parent3CategoryID, parent3.displayID AS parent3DisplayID,
                          parent3.displayDescr AS parent3DisplayDescr, parent4.categoryID AS parent4CategoryID, parent4.displayID AS parent4DisplayID,
                          parent4.displayDescr AS parent4DisplayDescr, parent5.categoryID AS parent5CategoryID, parent5.displayID AS parent5DisplayID,
                          parent5.displayDescr AS parent5DisplayDescr, parent6.categoryID AS parent6CategoryID, parent6.displayID AS parent6DisplayID,
                          parent6.displayDescr AS parent6DisplayDescr, child.metaclass, parent1.parentID AS parent1ID, parent2.parentID AS parent2ID, parent5.parentID AS parent5ID,
                          parent4.parentID AS parent4ID, parent3.parentID AS parent3ID, parent6.parentID AS parent6ID, dbo.func_GetRootParent(child.categoryID) AS rootParent
    FROM         dbo.tbl_Contests AS parent5 LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent6 ON parent5.parentID = parent6.categoryID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS parent4 ON parent5.categoryID = parent4.parentID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS parent2 LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent3 ON parent2.parentID = parent3.categoryID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS child LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent1 ON child.parentID = parent1.categoryID ON parent2.categoryID = parent1.parentID ON parent4.categoryID = parent3.parentID

    GO

     

    I've gone ahead and highlighted the relevant parts. WHY DOES THIS NOT WORK AS IT SHOULD? At least I have a possible lead for a bodge in that I've discovered that explicitly calling the function works better than using the column name (indeed it reduced a never-ending query to 3 seconds)...

     

    Edit: Dammit, this should haave gone in sidebar since I actually figured it out. Stupid delete timer expired, though. Mods? Alex?



  • @Weng said:

    So I'm sitting here, trying to figure out some hella performance
    issues.



    as soon as you convert that bloody mess into a nested set, you will feel much better



  • @Weng said:

    So I'm sitting here, trying to figure out some hella performance issues.

    Apparently SQL Server 2005 is GOD DAMNED RETARDED when it comes to joining a view which contains a function output against a table and does a naive join with no predicates regardless of what you tell it to do. End result, BILLIONS OF ROWS!

     

    However, it actually WORKS in the simplified case, because the query planner figures it out. 

     

    But as soon as you join those two together, the world fucking explodes and it does a naive join. I get the feeling it's going to be a long weekend as I recode everything in the world to make that column static.

     

    Works fine:

    SELECT rootParent , tbl2.categoryID FROM v_ContestParents AS tbl INNER JOIN tbl_Contests AS tbl2 on  dbo.func_GetRootParent(tbl.categoryID) = tbl2.categoryID

    Produces an utterly insane pile of rows that it absolutely should not:

    SELECT tbl.rootParent , tbl2.categoryID FROM v_ContestParents AS tbl INNER JOIN tbl_Contests AS tbl2 on tbl.rootParent = tbl2.categoryID

     

    v_ContestParents is defined as:

    CREATE VIEW [dbo].[v_contestParents]
    AS
    SELECT     child.categoryID, child.displayID, child.displayDescr, parent1.categoryID AS parent1CategoryID, parent1.displayID AS parent1DisplayID,
                          parent1.displayDescr AS parent1DisplayDescr, parent2.categoryID AS parent2CategoryID, parent2.displayID AS parent2DisplayID,
                          parent2.displayDescr AS parent2DisplayDescr, parent3.categoryID AS parent3CategoryID, parent3.displayID AS parent3DisplayID,
                          parent3.displayDescr AS parent3DisplayDescr, parent4.categoryID AS parent4CategoryID, parent4.displayID AS parent4DisplayID,
                          parent4.displayDescr AS parent4DisplayDescr, parent5.categoryID AS parent5CategoryID, parent5.displayID AS parent5DisplayID,
                          parent5.displayDescr AS parent5DisplayDescr, parent6.categoryID AS parent6CategoryID, parent6.displayID AS parent6DisplayID,
                          parent6.displayDescr AS parent6DisplayDescr, child.metaclass, parent1.parentID AS parent1ID, parent2.parentID AS parent2ID, parent5.parentID AS parent5ID,
                          parent4.parentID AS parent4ID, parent3.parentID AS parent3ID, parent6.parentID AS parent6ID, dbo.func_GetRootParent(child.categoryID) AS rootParent
    FROM         dbo.tbl_Contests AS parent5 LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent6 ON parent5.parentID = parent6.categoryID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS parent4 ON parent5.categoryID = parent4.parentID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS parent2 LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent3 ON parent2.parentID = parent3.categoryID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS child LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent1 ON child.parentID = parent1.categoryID ON parent2.categoryID = parent1.parentID ON parent4.categoryID = parent3.parentID

    GO

     

    I've gone ahead and highlighted the relevant parts. WHY DOES THIS NOT WORK AS IT SHOULD? At least I have a possible lead for a bodge in that I've discovered that explicitly calling the function works better than using the column name (indeed it reduced a never-ending query to 3 seconds)...

     

    Edit: Dammit, this should haave gone in sidebar since I actually figured it out. Stupid delete timer expired, though. Mods? Alex?

     

     FTFY.



  • parent1 to parent6?  I really hope this is anonymized or else the schema is TRWTF.

     

    Query planning is a bitch, though.  I'm kind of confused by what you are trying to do, though.  Is there no where clause on the query?  I imagine joining two tables on a computed column is going to be expensive unless there's something limiting the number of rows returned by one of the tables.



  • I've joined on computed UDF columns plenty of times and I've never seen it ignore the predicates.  Whatever the problem is, it's not with SQL Server.

    I'd start by rewriting that view as something that isn't a complete pile of crap.  The whole design is a mess - that's completely the wrong way to manage a hierarchy - but ignoring that aspect for the moment, it's almost indecipherable what's going on.  I'd like to know how func_GetRootParent behaves with NULLS, and how big that view gets with all those OUTER JOINs.  It's going to be incredibly messy because what you think is the "child" might actually be included several times because it's someone else's "parent" (or "parent2" or "parent3").  And joining it with another instance of tbl_Contests, or worse, joining two instances of this view together, could very well explode.  I can't even imagine what the optimizer has to go through to sort this all out.  The only thing that really surprises me is that the output of your first query isn't already a mess.

    Look at the plan output, maybe it'll tell you something more.

    Edit: I'm just wondering, are you sure that it's returning billions of rows, or are you just assuming that because of how long it's taking?  Your view isn't indexed or even schema-bound and I'm betting that the function isn't either, so there's no way for the optimizer to inline it or really have any idea what it's supposed to do.


  • Garbage Person

    @Aaron said:

     

    Edit: I'm just wondering, are you sure that it's returning billions of rows, or are you just assuming that because of how long it's taking?  Your view isn't indexed or even schema-bound and I'm betting that the function isn't either, so there's no way for the optimizer to inline it or really have any idea what it's supposed to do.

    It's actually returning billions of rows. If you let it run partially and then look at the partial output, it is literally spitting out a cross join (and very, very slowly I might add - about 6 seconds per row) rather than what I actually asked for.

     

     And yes, I know this entire setup is TRWTF - v2.0 solves the problem altogether by nuking that heirarchy with fire.



  • @Weng said:

    nuking that heirarchy with fire.
    I am intrigued by this concept of "nuclear fire."  Do you have a brochure, or perhaps a newsletter I could subscribe to?


  • Garbage Person

    @Aaron said:

    how big that view gets with all those OUTER JOINs
    Exactly as many rows as are in tbl_Contest since that view just joins tbl_Contest against itself 6 times (for printing the damned thing on damned reports) to unwind that retarded heirarchy. func_GetRootParent just unwinds it the whole way for actual meaningful computation. It copes with null inputs by... Returning null. WTF else would I have it do?



  • @bstorer said:

    @Weng said:

    nuking that heirarchy with fire.
    I am intrigued by this concept of "nuclear fire."  Do you have a brochure, or perhaps a newsletter I could subscribe to?

     

    I also need nuclear fire. Please email me teh codez. ub3rl3333333t@aol.com



Log in to reply