SQL Aggregate with List
-
I'm using SQL Server. I have a list of people and the widgets in their possession, and I need to be able to get a single row for each type of widget with the total count of widgets and a comma-separated list of all the names of the people who own at least one widget of that type. People and widgets each have unique identifiers.
So far I have:
SELECT wt.widget_type_name "Type" , ws.widget_state_name "State" , COUNT(w.widget_id) "Quantity" , STUFF((SELECT ',' + p1.person_name FROM People p1 WHERE p1.person_id = p.person_id FOR XML PATH('')), 1, 1, '') "List of People" FROM Widgets w INNER JOIN Widget_Types wt on wt.widget_type_id = w.widget_type_id INNER JOIN Widget_States ws on ws.widget_state_id = w.widget_state_id LEFT JOIN Ownership o ON w.widget_id = o.widget_id LEFT JOIN People p ON o.person_id = p.person_id GROUP BY wt.widget_type_name, ws.widget_state_name /* , p.person_id */
The problem I'm having is that SQL Server complains about
p.person_id
not being in an aggregate function or theGROUP BY
list, but if I uncomment that field, I get all the unique people-widget combinations (unless one person owns more than one widget of the same type).I thought a little bit about switching to use a CTE instead of STUFF...FOR XML PATH, but I can't quite figure out how it would work. Can anyone help me with either solution? (Or any other method, I'm not picky, as long as it works. :D )
Edit: I should also add that widgets can be unowned, but I still want to see the count of those in the query. The
"List of People"
should benull
in such cases. I don't care about people who don't have any widgets.Edit edit: I should also update the query to reflect that. :P
Editx3: Updated query to add info in response to discussion below.
Editx4: Ditto.
-
SQL's right, putting
p.person_id
inside theSTUFF
statement means it has to be part of the aggregation via. its own rules.The
FOR XML PATH
hack is the right thing to do here, but I think you're structuring the query a bit wrong. Look at your description and we have two pretty distinct things:- The count of
Widgets
, grouped by widget type. (BTW you didn't spell it out here, but I assume there's a 1:1 relationship betweenwidget_type
andwidget_id
?) - A comma-separated list of names of people who own at least one of widget, grouped by widget
I'd create the second data set elsewhere (table variable, temp table, even a subquery) and merely join it to the first. It'll be an easy join because they're both grouped by
widget_type
.
EDIT: actually, thinking about this again, in the
STUFF
you're grabbing aperson_id
by equating it to the outer query'sperson_id
, when you could instead get one from the outer query'swidget_type
. Maybe something like this:SELECT w.widget_type "Type" , COUNT(w.widget_id) "Quantity" , STUFF((SELECT DISTINCT ',' + p1.person_name FROM Widgets w1 JOIN Ownership o1 on w1.widget_id = o1.widget_id JOIN Person p1 on p1.person_id = o1.person_id WHERE w1.widget_type = w.widget_type FOR XML PATH('')), 1, 1, '') "List of People" FROM Widgets w INNER JOIN Ownership o ON w.widget_id = o.widget_id --INNER JOIN People p ON o.person_id = p.person_id (no longer needed; we get the ownership count from Ownership table GROUP BY w.widget_type /* , p.person_id */
That should make SQL happy? Untested.
EDIT: except you need a
SELECT DISTINCT
in theSTUFF
query, otherwise you'll get duplicated owner names.(Also add obligatory comment about how SQL keywords written in all-caps is both extremely ugly and completely unnecessary. Microsoft's style guide can go fuck itself.)
- The count of
-
@djls45
You can CTE out the GROUP BY to solve the problem with undesired row fan-out, I would expect that to optimize better than the scalar sub-query approach. I don't think there's a way to avoid the FOR XML PATH approach since you have multiple groupings of records you need to return.;WITH cte_WidgetCounts AS ( SELECT w.widget_id ,w.widget_type AS "Type" ,COUNT(*) AS "Quantity" FROM Widgets w WHERE EXISTS ( SELECT 1 FROM Ownership WHERE widget_id = w.widget_id ) GROUP BY w.widget_id, w.widget_type ) SELECT wc.Type ,wc.Quantity ,STUFF((SELECT ',' + p.person_name FROM People p WHERE p.person_id IN ( SELECT person_id FROM ownership o WHERE o.widget_id = wc.widget_id ) FOR XML PATH('')), 1, 1, '') AS "List of People" FROM cte_WidgetCounts wc
-
@blakeyrat said in SQL Aggregate with List:
BTW you didn't spell it out here, but I assume there's a 1:1 relationship between
widget_type
andwidget_id
?That's a many-to-one relationship. Each
widget_id
has only onewidget_type
, but there are multiple of each type.
-
@djls45 I'm confused about your schema.
For example, in this query, you're doing the
Ownership
table join based on ID, but your results listwidget_type
...Like, if I were designing a database, I'd have either:
widget_name
->widget_id
or
widget_type
->widget_type_id
I have to admit I don't quite understand what's going on here, but it seems you're mixing and matching the idea of a widget and a widget's type willy-nilly.
-
@blakeyrat Ah. To clarify, then, there is a separate
Widget_Type
table that haswidget_type_id
andwidget_name
columns, andWidget
has a FK onwidget_type_id
. Since the join toWidget_Type
is only to getwidget_name
, I figured I could simplify the query a bit by omitting that relationship.Widget
also has variouswidget_state
values that are included in the original query, and there's a number of other values from other table relationships, but those seemed to also be extraneous.
-
@djls45 Ok that makes more sense-- so your query doesn't output the FINAL data, it just outputs the type's ID and you're too lazy to join to the actual name. ;)
EDIT: But next time, at least name the foreign key after the primary it links to-- if it had been called
widget_type_id
instead ofwidget_type
this all would have been way less confusing.
-
@djls45
Hm, yeah, I misinterpreted that too, my CTE method isn't going to work either, lol.I don't suppose this is on SQL Server 2017
-
@djls45
With the caveat that this is using an EVIL SCALAR UDF and thus may not scale up very well...CREATE FUNCTION dbo.udf_GetWidgetOwners( @widget_type VARCHAR(50) ) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @val VARCHAR(MAX) SELECT @val = COALESCE(@val + ',' + p.person_name, p.person_name) FROM Widgets w INNER JOIN Ownership o ON w.widget_id = o.widget_id INNER JOIN People p ON o.person_id = p.person_id WHERE w.widget_type = @widget_type RETURN @val END GO
SELECT w.widget_type AS Type ,COUNT(*) AS Quantity ,dbo.udf_GetWidgetOwners(w.widget_type) AS "List of People" FROM Widgets w GROUP BY w.widget_type
-
@izzion said in SQL Aggregate with List:
I don't suppose this is on SQL Server 2017
No, I've got 2008, 2012, and 2016, though I
mostlyalmost entirely use 2012.
-
@djls45
Yeah, figured as much. SQL 2017 introduces STRING_AGG which seems to be kind of a more flexible, built in version of the COALESCE-in-a-UDF method I outlined. The COALESCE-in-a-UDF method will work on 2008+, but you should test with a more realistic schema than what I came up with in my simple exmaple; scalar UDFs are not known for their performance or scalability.
-
@izzion said in SQL Aggregate with List:
With the caveat that this is using an EVIL SCALAR UDF and thus may not scale up very well...
There's also the fact that I can't create database objects (like UDFs) due to policy. I'm restricted to DML statements.
-
@djls45
Hm. Well, then, unless you can buy the DBA a good bottle of scotch, you're gonna be stuck with FOR XML PATH...Here's a fiddle that tests what is effectively blakeyrat's version... I changed the join in the outer query and the DISTINCT in the inner query around to more closely match your initial snippet. Comes down to whether you want to:
- count widgets that are not owned as part of their type (blakey's use of INNER JOIN in the outer query would eliminate those rows from the count)
- have duplicate names in the list of people column (add the DISTINCT to the subquery to remove the duplicates)
-
@djls45 Did you try my version of the query BTW? Did it work?
-
@blakeyrat said in SQL Aggregate with List:
@djls45 Did you try my version of the query BTW? Did it work?
Yes, I just did. There's actually several tables that I replaced with the single
Ownership
relationship table, so I had to figure out where exactly to split the queries between the outer query and the subquery inSTUFF
. But I once I got that figured out and noticed your note on usingdistinct
in the subquery, it appears to work beautifully. Thank you very much!Now to see if I can tweak @izzion's CTE solution to get it working, too. :D
-
@izzion Yeah, it looks like you were right in that a CTE doesn't really solve the issue. :(
Your fiddle doesn't seem to have anything in it.
-
@blakeyrat said in SQL Aggregate with List:
(Also add obligatory comment about how SQL keywords written in all-caps is both extremely ugly and completely unnecessary. Microsoft's style guide can go fuck itself.)
Oh, and I only use all-caps keywords for queries I show to other people (with Initial_Case for table names and all_lower for columns). In my own one-off queries, I just use all lowercase (except in a string literal if it has case-sensitive collation).
-
@djls45 said in SQL Aggregate with List:
Your fiddle doesn't seem to have anything in it.
Not just blank, but set to the wrong DBMS. (MySQL 5.something.)
-
@djls45
Bleh, it must have changed the link when I expanded the query plan or something and invalidated the URL. I don’t use sqlfiddle very often, so I guess I failed it :/
-
I used to do this stuff all the time back in the day.
But now, I'd just extract the results without users, then make another query to get all the user ids.
KISS.