Handling of complex data design



  • <meta content="text/html; charset=utf-8" http-equiv="CONTENT-TYPE" /><title></title><meta content="OpenOffice.org 2.0 (Linux)" name="GENERATOR" /><meta content="Stephen Rauschmayer" name="AUTHOR" /><meta content="20061028;23151700" name="CREATED" /><meta content="16010101;0" name="CHANGED" />

    <style type="text/css">
    <!--
    	@page { size: 8.5in 11in; margin: 0.79in }
    	P { margin-bottom: 0.08in }
    -->
    </style>
    

    I need to create a data model for an application that I am writing, but I am extremely unskilled in this area, and could use a little guidance.  I have searched google for answers previously, but I don't know enough to make an effective search.

     
    The problem is that I have a set of items each of which has multiple dependencies.  The problem is the system needs an expandable number of both items and dependencies.  I tried to follow the normalization guides producing three tables:


    Item:                               Dependency:                DependencyGroup        
    Field           Type            Field         Type          Field             Type
    ItemID          int               DepID       int              GrpDepID     int
    GrpDepID    int                                                    DepID           int

    So dependancies are broken into groups, each item has a group. 

    Now the problem occurs because given a list of dependencies, I want to return all items which have a dependency group that contain a subset of the given dependencies.  So if I have a dependency set [1,2,3] then an item with a group [1,2] would match, but and item with a group [1,4] would not.

    It seems that I want to run a query that uses information in the rows to generate my results.  I could return all the results that contain any of the dependencies and then using my software to reduce the result set to what is proper.

    That seemed to me to remove part of the usefulness of the database, so I made a query.  Its complex, so I do not think it is the most efficient solution.

    SELECT ItemID FROM Item WHERE GrpDepID NOT IN (SELECT DISTINCT GrpDepID FROM DependancyGroup WHERE DepID NOT IN (1,2,3))

     
    I appreciate advice.  Is this the best way to organize the data?  Is there a better way to store or retrieve it with respect to this query? 

    Thank you for your help.



  • Your data model seems to be sound; your query IMO has room for improvement.

    SELECT ItemID FROM Item WHERE NOT EXISTS ( SELECT 'X' FROM DependencyGroup WHERE GrpDepID=Item.GrpDepID AND DepID NOT IN (1,2,3))

    But don't trust me blindly, try both versions and see which is faster; query optimizers are always good for surprises.
     



  • Your structure is ok. One problem you have is that "NOT IN " processing is traditionally slow and tends to avoid using indexes, and can be confusing maybe. If you match the number of dependencies in the group against the number in the group AND in your list, then they should tie. Try this: select GrpDepID from DependencyGroup
    group by GrpDepID
    having  count(*)= sum(case when DepID in (1,2,3) then 1 else 0 end )

    changing (1,2,3) as appropriate( you could make it a sub-select statement ). Any good?

    One other thought for another approach: if you had a small number of Dependencies ( < 30 ), then you could map them to 1,2,4,8,16,32,64 etc. You could then store your Group of Dependencies as a single integer  e.g. 1+2+4  = 7( actually strictly this is a bitwise OR operation) , or 2+4+64=70. Then if you are looking for Groups that are a subset of 1+8+32, you just do: select where DependencyGroupIntNumber & 41 = DependencyGroupIntNumber ( bitwise AND ). This is a bit radical(!) but cuts out a whole table. But it does only work if you have <30 or so Dependencies, as an integer will store 2 to the power of 32. Regards. 

     


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.