MYSQL group products by compatibilty
-
I have two tables, one is for kits that can contain various sub-products, and another is a list of sub-products that can't be added together in the same kit.
In simplified form:
CREATE TABLE `kits` ( `subProdID` INT(11) NOT NULL DEFAULT '0', `kitID` VARCHAR(50) NULL DEFAULT NULL, `kitName` VARCHAR(512) NULL DEFAULT NULL, PRIMARY KEY (`subProdID`) ); CREATE TABLE `subProd_incompatible` ( `IncompID` INT(11) NOT NULL DEFAULT '0', `subProdID` INT(11) NULL DEFAULT NULL, `subProdIncompID` INT(11) NULL DEFAULT NULL, PRIMARY KEY (`VersaIncompID`) )
In
subProd_incompatible
,subProdID
andsubProdIncompID
all exist as subProdIDs inkits
.Right now I use procedural code outside the database to produce a list of kits that all contain products that do not conflict according to the
subProd_incompatible
table, but I'd like to do it in SQL if at all possible.What I'd like is to
SELECT `kitID`, `kitName`, GROUP_CONCAT(`subProdID`) FROM ("a subquery") AS Q GROUP BY `GroupingCriterion`
"a subquery" should return the columns of
kits
, along with a generatedGroupingCriterion
.
-
Is the incompatibility table symmetrical? If
A
is incompatible withB
, does it have bothsubProdID = A, subProdIncompID = B
andsubProdId = B, subProdIncompID = A
?Don't have a MySQL instance handy, but you'll probably want to self-join the
kits
tableWHERE the_other_kits_table.kit_id NOT IN (SELECT subProdIncompID FROM subProd_incompatible WHERE subProdId = the_first_kits_table.subProdID)
(not sure if you can put that in theON
clause, but it shouldn't matter if MySQL optimizer if worth its salt), and group bythe_first_kit_table.subProdID
, concatenatingthe_other_kits_table.subProdID
.Something-kinda-sorta-like-that, at least.
-
Inner queries (and NOT IN specifically) are usually pretty bad for performance. (Although easier for humans to read.)
Better would be to left-join all the way across, then check for NULLs where appropriate.
This is T-SQL psuedo-code, but something like:
select * from kits k left join incompatible i on i.subProdId = k.subProdId left join incompatible i2 on i2.subProdinCompId = k.subProdId where i.incompId IS NULL and i2.incompId IS NULL
Is that what you're looking for?
-
Good point with the performance hit. Rewritten:
select k.kitid, k.kitname, group_concat(k2.subProdID) from kits k cross join kits k2 left outer join incomp i on i.incomp1 = k.subProdID and i.incomp2 = k2.subProdID where i.incompID is null group by k.kitid, k.kitname;
I think you need a cross join there, since you need to concatenate something. I assume a symmetrical incompatibility table (records for both A,B and B,A).
Fix column names as necessary.
-
Inner queries (and NOT IN specifically) are usually pretty bad for performance. (Although easier for humans to read.)
Not in PostgreSQL. And apparently, neither in MySQL.
A matter of preference, looks like.
-
Not in PostgreSQL. And apparently, neither in MySQL.
That's a pretty interesting article/optimization. I wonder why SQL Server doesn't have such a concept as an "anti-join".
-
@cartman82 said:
Not in PostgreSQL. And apparently, neither in MySQL.
That's a pretty interesting article/optimization. I wonder why SQL Server doesn't have such a concept as an "anti-join".
You don't write it out explicitly, but the query planner should catch it for you when you use a
NOT IN/NOT EXISTS
clause.
-
Oh well. I guess my advice is out-of-date. Live and learn.
-
Oh well. I guess my advice is out-of-date
@cartman82's article, OTOH, is flat out wrong, apparently (dated 2006, as comparet to article's 2009).
You still might be right, though - we assume MySQL's query planner is sane and sensible, and well, it's MySQL.
-
@cartman82's article, OTOH, is flat out wrong, apparently (dated 2006, as comparet to article's 2009).
I don't see how your link is a rebuke of any kind. Obviously, performance experiences from one RDBMS won't always translate to others.
In fact, this guy did the same experiment with SQL Server. Results:
That's why the LEFT JOIN / IS NULL query takes 810 ms, or 3 times as much as the NOT EXISTS / NOT IN query.
This is frankly surprising to me. I also thought LEFT JOIN was faster in SQL Server. I guess not.
-
This is frankly surprising to me. I also thought LEFT JOIN was faster in SQL Server. I guess not.
This sort of thing is so often dependent on the query and the underlying data.
-
Yeah, right, I went against Blakey's conclusion of "there's no concept of an ANTI JOIN in SQL Server".
Still odd for the optimizer to not catch it, but hey, it works almost as fast as MySQL in all cases.