Am I missing something?


  • Garbage Person

    I have a database table structured as so:

    tbl_Categories:
    categoryID - int (PK)
    parentID - int (FK: tbl_Categories.categoryID)
    halfAMillionOtherIrrelevantColumns - varchar(50)

    Obviously a category in this table has a parent (possibly null) referring to the same table. The table can thus be thought of as holding a series of trees, rooted at rows with null parents.

     This monstrosity of a design works FAIRLY well, but a current major limit in the application that uses it is that it can only reliably be nested a certain number of layers deep because in instances where we want to quickly see the entire lineage back to the root, we consult a view with the following (unedited) definition:

    CREATE VIEW [dbo].[v_contestParents]
    AS
    SELECT     child.categoryID, child.displayID, child.displayDescr, parent1.categoryID AS parent1CategoryID, parent1.displayID AS parent1DisplayID,
                          parent1.displayDescr AS parent1DisplayDescr, parent2.categoryID AS parent2CategoryID, parent2.displayID AS parent2DisplayID,
                          parent2.displayDescr AS parent2DisplayDescr, parent3.categoryID AS parent3CategoryID, parent3.displayID AS parent3DisplayID,
                          parent3.displayDescr AS parent3DisplayDescr, parent4.categoryID AS parent4CategoryID, parent4.displayID AS parent4DisplayID,
                          parent4.displayDescr AS parent4DisplayDescr, parent5.categoryID AS parent5CategoryID, parent5.displayID AS parent5DisplayID,
                          parent5.displayDescr AS parent5DisplayDescr, parent6.categoryID AS parent6CategoryID, parent6.displayID AS parent6DisplayID,
                          parent6.displayDescr AS parent6DisplayDescr, child.metaclass, parent1.parentID AS parent1ID, parent2.parentID AS parent2ID,
                          parent5.parentID AS parent5ID, parent4.parentID AS parent4ID, parent3.parentID AS parent3ID, parent6.parentID AS parent6ID
    FROM         dbo.tbl_Contests AS parent5 LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent6 ON parent5.parentID = parent6.categoryID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS parent4 ON parent5.categoryID = parent4.parentID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS parent2 LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent3 ON parent2.parentID = parent3.categoryID RIGHT OUTER JOIN
                          dbo.tbl_Contests AS child LEFT OUTER JOIN
                          dbo.tbl_Contests AS parent1 ON child.parentID = parent1.categoryID ON parent2.categoryID = parent1.parentID ON
                          parent4.categoryID = parent3.parentID

     

    (actually now that I look at it I have no idea why the hell that works... Frighteningly, it MAY NOT and I need to examine this in more detail later). This view is designed to unwind the table and show us the entire lineage from any given node back to the root - but it's ugly, limited, difficult to build UIs to represent, and joins the same table against itself a half dozen times. For various reasons, this view is joined against by numerous other views and it gets really ridiculous.Furthermore, this view is NEVER called from time-insensitive parts of the application where we can wait for it to do its processing - but it's called an average of 25 times every time a user form is filled out, and TENS OF THOUSANDS of times during some reporting operations. It's not slow because of indexing, but it's a timebomb disaster waiting to happen.

     We've considered writing a stored procedure that traverses up the parentID column, but it seems that would make things significantly harder to work with.

     

    Also, a related problem: On the same table we have a need to identify all the leaf nodes on a given tree. We already have a stored procedure written that takes a categoryID and pulls all its direct children - the obvious way to go about it is to call  that recursively - but the problem is, this table gets BIG. A report running on top of this proposal would result in that stored procedure being called one time for each and every node in the tree - a number that exceeds 100k on our cutesy little test system. What amounts to 100,000 select statements in a short period of time isn't that bad, but when you consider that it then has to do the more complicated business of actually taking the categoryIDs of those leaves (potentially up to n-1 of them, for a very shallow tree - but more likely somewhere in the 50-75% range) that becomes 100,000 select statements followed on by 75,000 joins against 3+ views which themselves are joins against 3+ tables. While we can't really do much to cut down the actual reporting work involving the complicated join-madness (aside from cleaning up the above noted business, which would cut out a lot of that overhead), we'd like to try to optimize that initial leaf-node finding.



  • convert it to nested set



  • Nested set seems like the best option, but if you also have a lot of mutations on the data t will probebly slow problems in that department.

    I don't really have time now to really think about it, but my spidey sense tells me that I would look into abstracting the "structure" information from the data. So having a seperate table which describes the structure. My first thought on that was to have the full set of parent nodes per node, but that would become absolutely horrifically massive, so is probebly not an option.

    Stored proc to collect parent nodes might still be the best way performance wise,  even though you say that it makes it harder to work with. (which i don't really understand why)

     


     



  • Check out recursive CTE ( WITH clause).  I can't say if it wil solve all your problems but it will give you options on how to solve these kinds of queries.



  • If you are on SQL 2005/2008, then CTEs (Common table expressions) are your friend. The following will build the hierarchy of a parent-child relationship and the second query will get the parents of an arbitrary node.

    DECLARE @Hierarchy table
        (
          Code NVARCHAR(3),
          ParentCode NVARCHAR(3),
          Name NVARCHAR(100)
        )
    DECLARE @Code varchar(10)
    SET @Code = 'SPZ'
    INSERT INTO
        @Hierarchy
        SELECT
            'SFD',
            NULL,
            'Softdrink'
        UNION ALL
        SELECT
            'CCL',
            'SFD',
            'Coca Cola'
        UNION ALL
        SELECT
            'FNT',
            'SFD',
            'Fanta'
        UNION ALL
        SELECT
            'SPR',
            'SFD',
            'Sprite'
        UNION ALL
        SELECT
            'ACL',
            NULL,
            'Alcohol'
        UNION ALL
        SELECT
            'TQL',
            'ACL',
            'Tequila'
        UNION ALL
        SELECT
            'VDK',
            'ACL',
            'Vodka'
        UNION ALL
        SELECT
            'SMN',
            'VDK',
            'Smirnoff'
        UNION ALL
        SELECT
            'SPZ',
            'SPR',
            'Sprite Zero'
    ;

    WITH HierarchyCTE
              AS (
                   SELECT
                    Code,
                    ParentCode,
                    Name,
                    1 AS Depth,
                    CAST(Name AS nvarchar(1000)) AS FullPath
                   FROM
                    @Hierarchy
                   WHERE
                    ParentCode IS NULL
                   UNION ALL
                   SELECT
                    children.Code,
                    children.ParentCode,
                    children.Name,
                    parents.Depth + 1 AS Depth,
                    CAST(parents.FullPath + N'\' + children.Name as nvarchar(1000)) AS FullPath
                   FROM
                    HierarchyCTE AS parents
                    INNER JOIN @Hierarchy AS children ON children.ParentCode = parents.Code
                 )
        SELECT
            Code,
            ParentCode,
            Name,
            Depth,
            FullPath
        FROM
            HierarchyCTE;

    WITH    ParentsCTE
              AS (
                   SELECT
                    Code,
                    ParentCode,
                    Name,
                    1 AS itemDepth
                   FROM
                    @Hierarchy
                   WHERE
                    Code = @Code
                   UNION ALL
                   SELECT
                    parents.Code,
                    parents.ParentCode,
                    parents.Name,
                    children.itemDepth + 1 AS itemDepth
                   FROM
                    @Hierarchy AS parents
                    INNER JOIN ParentsCTE AS children ON children.ParentCode = parents.Code
                 )
        SELECT
            Code,
            ParentCode,
            Name
        FROM
            ParentsCTE
        where
            Code <> @Code
            -- OR to get just the root node add
            -- AND ParentCode IS NULL
     



  • @stratos said:

    Stored proc to collect parent nodes might still be the best way performance wise,  even though you say that it makes it harder to work with. (which i don't really understand why)
     

     +1.

    Views are definitely not suited for this kind of thing.

    The adjacency model is theoretically enough, and it works. Query the page tree using an SP with recursion. Performance is high in our CMS, where it is used for the page tree, although the page table doesn't get as big as Weng's numbers, so YMMV. The alternatives are (Materialized) Path or Nested Sets, but while these are structurally or mathematically "more interesting" to the academic programmer, they're not always the best choice performance-wise:

    [nested sets have] performance limitations on insert, update, and delete. This arises because the nested set model encodes the hierarchy based on what the set contains: when a node is inserted or deleted, all of the sets containing the node must be updated as well. In other words, the encoding depends on the current state of the database. This behavior makes the nested set model less desirable for large, dynamic hierarchies.

    So. Take yer pick.

    I'd go with a recursive SP, because it's likely to be quick enough, simple to implement, very maintainable. You don't have to delve into the math and write your own fricking DB API just to get a few useless category names lined up all pretty.



  •  I'm a little confused on what you are trying to do, are you just trying to get the root category of a category?   Here are my suggestions, istead of a recursive SP, use a recursive function, then for the code have an SP which calls that function.  

    If you want to get better performance just tweak your schema a bit.  Add a column called RootCategoryID or whatever and just store the root there.  Sure, it isn't normalized, but if you only allow accessthrough stored procs, it should be easy to make sure the column stays accurate.


  • ♿ (Parody)

    @dhromed said:

    The alternatives are (Materialized) Path or Nested Sets, but while these are structurally or mathematically "more interesting" to the academic programmer, they're not always the best choice performance-wise:

    [nested sets have] performance limitations on insert, update, and delete. This arises because the nested set model encodes the hierarchy based on what the set contains: when a node is inserted or deleted, all of the sets containing the node must be updated as well. In other words, the encoding depends on the current state of the database. This behavior makes the nested set model less desirable for large, dynamic hierarchies.

    I absolutely love nested sets, and not yet had any performance problems... you have to have a lot of data (tens of thousands of rows) and high frequency updates (multiple times a second) to make a difference. There are a few performance mods you can do, too: put the hierarchy in its own table, space out the numbers (gaps of 10) and "reindex" as a batch job, and queue changes (batch update ever few minutes).

    But........... the biggest issue I've had with nested sets (and, by far, the most problematic) is that few developers want to take the time to understand it. THey will whine, bitch, and eventually absuse or bypass it because it's too complciated for them. Depending on whether the heirarchy usage needs to be for the Lowest Common Denominator (buried with in a framework, probably not... consumed by a lot of other modules, maybe), it could be a poor choice just on that factor alone.

     



  • @Alex Papadimoulis said:

    you have to have a lot of data (tens of thousands of rows) and high frequency updates (multiple times a second) to make a difference.
     

    Somehow I keep forgetting that performance problems with a DB require that the amount of transactions be truly ridiculous (regardless of the cause of that).

    Or a crap server.

     


  • Garbage Person

    @dhromed said:

    Or a crap server.
    I highly expect this system to run into this problem. We had to backport from SQL Server 2008 (which we used throughout the dev process) to 2005 because the outsourced hosting company's servers "just absolutely could not run 2008" - which to me suggests they aren't even x64 (they DID stop selling 32bit SQL Server, right? Or is that just wishful thinking?)

     

    And all that, of course, in turn suggests that we aren't getting the dedicated box we're paying for. But good luck explaining that to the people that pay the bills.



  • @Weng said:

    @dhromed said:

    Or a crap server.
    I highly expect this system to run into this problem. We had to backport from SQL Server 2008 (which we used throughout the dev process) to 2005 because the outsourced hosting company's servers "just absolutely could not run 2008" - which to me suggests they aren't even x64 (they DID stop selling 32bit SQL Server, right? Or is that just wishful thinking?)
     

    SQL Server 2008 requires Windows Server 2003, Windows Server 2008, or Windows Server 2008 R2.   Your hosting company is probably running Windows 2000.


  • Garbage Person

    @tster said:

    Your hosting company is probably running Windows 2000.
    A decade-old legacy OS with under 1 year left on extended support on a "brand new dedicated box" we just bought not two months ago. BRILLANT!

     


  • Garbage Person

     Took a look at CTEs and it looks like a good place to start. Now I just need people to stop waffling about features that needed to be decided upon and crystallized THREE FARKING WEEKS AGO so I can get to the shit that's absofuckinglutely necessary in 5 weeks, and then maybe I can take a look at fixing the WTF-in-progress (I was at a meeting, looking at a piece of data - and it occured to me that it was ALREADY nested 7 layers deep - beyond the capacity of the idiotic view listed above. Fortunately it'll only cause cosmetic problems (unless someone COMPLETELY ignored my comments on that view - which is a distinct possibility because I labeled it "DO NOT USE THIS VIEW, IT IS FOR QUICK DEBUGGING REFERENCE ONLY" in big red ink on the whiteboard - and yet here we are)



  • I would definitely use a nested set model, probably with a tree ID (to be specified in every sql operation), so that you know you are always dealing with the right 'set'. tree_ID might well be FK to 'tree' table, which contains tree_ID and description etc.

    tree_ID - int      }
    categoryID - int }  PK
    left - int
    right - int
    halfAMillionOtherIrrelevantColumns - varchar(50)
    with a unique nonclustered index on tree_ID,left,right  (this might need more thinking, for optimisation of inserts etc)

    I would set up stored procs to insert and delete entries, and also to modify non-key columns. Views become much easier and non-nested. Finding the ultimate parent of an entry is trivial. Finding leaf level entries for a tree_ID is trivial. 

    Don't think you should be worried about response times when inserting or deleting rows - things basically boil down to statements like
    update tbl_Category set right = right-2 where tree_ID = X and right > Y    and if you've got the indexes right, that'll be fast.

     


Log in to reply