MYSQL group products by compatibilty



  • I have two tables, one is for kits that can contain various sub-products, and another is a list of sub-products that can't be added together in the same kit.

    In simplified form:

    CREATE TABLE `kits` (
        `subProdID` INT(11) NOT NULL DEFAULT '0',
        `kitID` VARCHAR(50) NULL DEFAULT NULL,
        `kitName` VARCHAR(512) NULL DEFAULT NULL,
        PRIMARY KEY (`subProdID`)
    );
    
    CREATE TABLE `subProd_incompatible` (
        `IncompID` INT(11) NOT NULL DEFAULT '0',
    	`subProdID` INT(11) NULL DEFAULT NULL,
        `subProdIncompID` INT(11) NULL DEFAULT NULL,
    	PRIMARY KEY (`VersaIncompID`)
    )
    

    In subProd_incompatible, subProdID and subProdIncompID all exist as subProdIDs in kits.

    Right now I use procedural code outside the database to produce a list of kits that all contain products that do not conflict according to the subProd_incompatible table, but I'd like to do it in SQL if at all possible.

    What I'd like is to

    SELECT `kitID`, `kitName`, GROUP_CONCAT(`subProdID`)
    FROM ("a subquery") AS Q
    GROUP BY `GroupingCriterion`
    

    "a subquery" should return the columns of kits, along with a generated GroupingCriterion.



  • Is the incompatibility table symmetrical? If A is incompatible with B, does it have both subProdID = A, subProdIncompID = B and subProdId = B, subProdIncompID = A?

    Don't have a MySQL instance handy, but you'll probably want to self-join the kits table WHERE the_other_kits_table.kit_id NOT IN (SELECT subProdIncompID FROM subProd_incompatible WHERE subProdId = the_first_kits_table.subProdID) (not sure if you can put that in the ON clause, but it shouldn't matter if MySQL optimizer if worth its salt), and group by the_first_kit_table.subProdID, concatenating the_other_kits_table.subProdID.

    Something-kinda-sorta-like-that, at least.



  • Inner queries (and NOT IN specifically) are usually pretty bad for performance. (Although easier for humans to read.)

    Better would be to left-join all the way across, then check for NULLs where appropriate.

    This is T-SQL psuedo-code, but something like:

    select *
    from kits k
     left join incompatible i on i.subProdId = k.subProdId
     left join incompatible i2 on i2.subProdinCompId = k.subProdId
    where i.incompId IS NULL and i2.incompId IS NULL
    

    Is that what you're looking for?



  • Good point with the performance hit. Rewritten:

    select k.kitid, k.kitname, group_concat(k2.subProdID)
    from kits k
    cross join kits k2
    left outer join incomp i on i.incomp1 = k.subProdID and i.incomp2 = k2.subProdID
    where i.incompID is null
    group by k.kitid, k.kitname;
    

    I think you need a cross join there, since you need to concatenate something. I assume a symmetrical incompatibility table (records for both A,B and B,A).

    Fix column names as necessary.



  • @blakeyrat said:

    Inner queries (and NOT IN specifically) are usually pretty bad for performance. (Although easier for humans to read.)

    Not in PostgreSQL. And apparently, neither in MySQL.

    A matter of preference, looks like.



  • @cartman82 said:

    Not in PostgreSQL. And apparently, neither in MySQL.

    That's a pretty interesting article/optimization. I wonder why SQL Server doesn't have such a concept as an "anti-join".



  • @blakeyrat said:

    @cartman82 said:
    Not in PostgreSQL. And apparently, neither in MySQL.

    That's a pretty interesting article/optimization. I wonder why SQL Server doesn't have such a concept as an "anti-join".

    It seems to have it.

    You don't write it out explicitly, but the query planner should catch it for you when you use a NOT IN/NOT EXISTS clause.



  • Oh well. I guess my advice is out-of-date. Live and learn.



  • @blakeyrat said:

    Oh well. I guess my advice is out-of-date

    @cartman82's article, OTOH, is flat out wrong, apparently (dated 2006, as comparet to article's 2009).

    You still might be right, though - we assume MySQL's query planner is sane and sensible, and well, it's MySQL.



  • @Maciejasjmj said:

    @cartman82's article, OTOH, is flat out wrong, apparently (dated 2006, as comparet to article's 2009).

    I don't see how your link is a rebuke of any kind. Obviously, performance experiences from one RDBMS won't always translate to others.

    In fact, this guy did the same experiment with SQL Server. Results:

    That's why the LEFT JOIN / IS NULL query takes 810 ms, or 3 times as much as the NOT EXISTS / NOT IN query.

    This is frankly surprising to me. I also thought LEFT JOIN was faster in SQL Server. I guess not.


  • ♿ (Parody)

    @cartman82 said:

    This is frankly surprising to me. I also thought LEFT JOIN was faster in SQL Server. I guess not.

    This sort of thing is so often dependent on the query and the underlying data.



  • Yeah, right, I went against Blakey's conclusion of "there's no concept of an ANTI JOIN in SQL Server".

    Still odd for the optimizer to not catch it, but hey, it works almost as fast as MySQL in all cases.


Log in to reply