SQL subset matching shenanigans



  • So, I have some Widgets, each of which consists of one or more Components. Each Component can be used by more than one kind of Widget; however, no Widget can use two of the same Component, so the results are a simple many-to-many mapping table of (Widget, Component) tuples, with an extra index on Component to speed up reverse lookups.

    Given this structure, I pose this problem:

    Given a set W of Widgets, partitioned into two mutually-exclusive sets X and Y by some criterion, I need to figure out all the Widgets in set X whose components cannot be used to make Widgets in set Y. In other words, if I take all the Components used to make the Widgets in the chosen subset of set X, I should not be able to make any Widget in set Y using those Components.

    Is there a way to do this easily in standard SQL-92? You are allowed to assume that no more than 3 Components are needed to make any Widget, but if you can come up with a solution that doesn't limit the number of Components in a Widget, that's even better, of course ;)


  • Discourse touched me in a no-no place

    @tarunik said:

    no Widget can use two of the same Component

    @tarunik said:

    You are allowed to assume that no more than 3 Components are needed to make any Widget

    I probably know the answer to this, given where it's being asked, but is there any particular reason for the actual existence of the many-many table given these particular criteria, rather than the Widget table having a set of (sparse) columns indicating the presence (or absence) of said components?



  • @tarunik said:

    Is there a way to do this easily in standard SQL-92?

    What, no love for SQL:1999, SQL:2003, or SQL:2011?



  • I could probably do this in pgsql, but I have no idea what SQL-92 entails. It'd be a waste of time to even try.


  • :belt_onion:

    it's just an old ansi standard.
    basically, don't use when clauses, which are new, and don't do any crazy postres specific shit and you're cool.
    or if you're an oracle guy, don't do (+) for outer joins because then we will hate you.

    edit: oh, and the select sum(column_x) over (partition by column_y) from tbl stuff might not work in sql-92? Can't recall, don't care enough to look.

    in this case i think the key thing is that sql-92 means you can't use database proprietary multiple row -> single row column list converters.


  • :belt_onion:

    So assuming that your widgets and components combine to create a table like so:
    table widget_component?
    widget_id
    component_id
    widget_name
    component_name

    [code]
    select widget_id,component_id
    from widget_component
    where widget_name in (set X)
    and component_id not in
    (select component_id
    from widget_component
    where widget_id in (select widget_id from widget_component where widget_name in (set Y) group by widget_Id having count(*) = 1)
    group by component_id)
    [/code]
    I think that works for the single component. of course you could take and manually jam it to work for the cases of 2 and 3 since that's your limit... but I'd agree it'd be nice to have something more elegant



  • @PJH said:

    I probably know the answer to this, given where it's being asked, but is there any particular reason for the actual existence of the many-many table given these particular criteria, rather than the Widget table having a set of (sparse) columns indicating the presence (or absence) of said components?

    From a normalization standpoint -- a many-many table beats sparse columns, IMO.

    @darkmatter said:

    it's just an old ansi standard.basically, don't use when clauses, which are new, and don't do any crazy postres specific shit and you're cool.or if you're an oracle guy, don't do (+) for outer joins because then we will hate you.

    Yep. No Postgres specific stuff, use the ANSI syntax when you need to outer join, and don't use when...and yeah, no database specific pivoting support. (I'm using SQLite to be precise.)

    Also, I'm natural keying widgets and components here. (Yes, really. It's simpler in my usecase.)


  • BINNED

    @darkmatter said:

    or if you're an oracle guy, don't do (+) for outer joins because then we will hate you.

    Nothing personal ...


  • :belt_onion:

    @darkmatter said:

    I think that works for the single component.

    Assuming I understand the problem that you're posing correctly. My translation was that you want only widgets in X that do not have a component subset that matches to an existing widget in Y's complete component list?



  • @darkmatter said:

    Assuming I understand the problem that you're posing correctly. My translation was that you want only widgets in X that do not have a component subset that matches to an existing widget in Y's complete component list?

    Let me try rephrasing that:

    So, let us assume I have a subset R of the partition X of Widgets. For all the Components that are used by the Widgets in R (in aggregate, not by a specific Widget), I should not find any subset of those Components that lets me make a Widget in partition Y in order for R to be the result set I am after.


  • :belt_onion:

    got it - i was close but not quite the same.... i'll try again. I think your actual need is easier to solve than what i was doing because it's one less partitioning of the data.


  • :belt_onion:

    how about this steaming pile of crap, because I don't know an ansi sql-92 way to do the listing. basically, it gets the 1, 2, or 3 combinations of components in Y, checks then against the full list from X, and then excludes all widgets in X that have any one of the components that match to any component that came from the full widget components matches that it found.

    [code]
    table widget_component:
    widget_id
    component_id
    widget_name
    component_name
    [/code]
    [code]
    select widget_id
    from widget_component X
    where widget_name in (set X)
    and not exists
    (select 1
    from
    (select one,two,three
    from
    (select a.component_id one, b.component_id two, c.component_id three
    from widget_component a
    left join widget_component b on a.widget_id = b.widget_id
    left join widget_component c on a.widget_id = c.widget_id
    where a.widget_name in (set Y)
    group by a.component_id, b.component_id, c.component_id
    ) Y
    where (Y.one in (select distinct component_id from widget_component where widget_name in (set X))
    and two is null
    and three is null)
    or (Y.one in (select distinct component_id from widget_component where widget_name in (set X))
    and Y.two in (select distinct component_id from widget_component where widget_name in (set X))
    and three is null)
    or (Y.one in (select distinct component_id from widget_component where widget_name in (set X))
    and Y.two in (select distinct component_id from widget_component where widget_name in (set X))
    and Y.three in (select distinct component_id from widget_component where widget_name in (set X)))
    ) cantusethese
    where cantusethese.one = X.component_id
    or cantusethese.two = X.component_id
    or cantusethese.three = X.component_id
    )
    [/code]



  • I'll give it a shot when I get home where the data is ;) (I'll have to clean it up though as set X is defined by a query over the data and there are no ID columns in the table -- as I said before, I'm using natural keys here because this is a data-analysis problem much more than anything resembling an online database.)



  • You know if you just draw that as a Venn diagram, you're 95% of the way to having a finished query.


  • :belt_onion:

    I wish good luck to anyone attempting to draw a Venn that represents the scenario above, and even more luck translating that to sql.

    Representative Scenario as I translated it:
    Set Y (widget: [components]):
    Widget 1: [a b c]
    Widget 2: [d e]
    Widget 3: [f]
    Widget 4: [p r]

    And Set X:
    Widget 1: [d q]
    Widget 2: [a e]
    Widget 3: [f c d]
    Widget 4: [q z f]
    Widget 5: [r q]

    becomes the pool of
    [a c d f q r z]

    Which can create
    [a b c], [d e], and [f] from Set Y

    Exclude Widgets in Set X that contain one from [a b c d e f]
    and it should return Widget 5 because it has none of the above in it.

    TBH I'm not sure how you create a Venn for that kind of conditional nesting of subsets. What I listed above though is good enough to build SQL from.


  • :belt_onion:

    Oops, had to edit a minute after I posted to correct the reason for the final outcome.
    i take it the Like means I have properly translated the problem at least.



  • I admit I don't understand the problem, and I don't see any table definitions from tarunik I can shove in MS SQL and doodle around with.



  • Translating @darkmatter's representative scenario into SQL:
    [code]
    CREATE TABLE Widgets (widget TEXT, component TEXT, PRIMARY KEY (widget, component));

    CREATE INDEX widget_components ON Widgets (component);

    INSERT INTO Widgets (widget, component) VALUES
    ('Y1', 'a'), ('Y1', 'b'), ('Y1', 'c'), ('Y2', 'd'), ('Y2', 'e'), ('Y3', 'f'), ('Y4', 'p'),
    ('Y4', 'r'), ('X1', d'), ('X1', 'q'), ('X2', 'a'), ('X2', 'e'), ('X3', 'f'), ('X3', 'c'),
    ('X3', 'd'), ('X4', 'q'), ('X4', 'z'), ('X4', 'f'), ('X5', 'r'), ('X5', 'q');
    [/code]



  • So there's only one table? Where does the many-many join come in?


  • :belt_onion:

    You don't need it, that table above is the result of the many-many join of his tables.
    I assumed he could create a view (or subquery) that mimics the single widget_component (or as he named it above, Widget) table, which is really all that's needed to demo the solution.

    Or as everyone here calls them, that's the xref table that holds the many-many relation between the Widget_List and the Component_List tables.


  • :belt_onion:

    we need some database diagramming going on!

    Component_List ---<< Widgets >>--- Widget_List

    damn you dischorse for jacking my < and > all up the first try.
    and it's a lot easier to diagram in a tool than in ascii; i feel like i screwed up the arrows' sides.



  • @blakeyrat said:

    So there's only one table? Where does the many-many join come in?

    It's a many-to-many relation -- the tables you're looking for are the "side" tables storing the auxiliary data about Widgets and Components, yet they are not needed for the purpose of this. In other words, this problem doesn't care about Widget_List or Component_List at all -- they can simply be joined in afterwards.



  • P.S. if it helps, I'm looking for the maximal solution to this problem -- things like the empty set, or a single widget, are trivial solutions and rather uninteresting as a result ;)



  • As I understand it though, this problem is underdetermined. Say you have x, x' and x'' consisting of components a, b and c respectively and y which consists of a+b+c. What is R?

    Istm of problem requires a brute force solution that SQL is not well suited for.


Log in to reply