SQL Aggregate with List



  • I'm using SQL Server. I have a list of people and the widgets in their possession, and I need to be able to get a single row for each type of widget with the total count of widgets and a comma-separated list of all the names of the people who own at least one widget of that type. People and widgets each have unique identifiers.

    So far I have:

    SELECT wt.widget_type_name "Type"
         , ws.widget_state_name "State"
         , COUNT(w.widget_id) "Quantity"
         , STUFF((SELECT ',' + p1.person_name
                  FROM People p1
                  WHERE p1.person_id = p.person_id
                  FOR XML PATH('')), 1, 1, '') "List of People"
    FROM Widgets w
    INNER JOIN Widget_Types wt on wt.widget_type_id = w.widget_type_id
    INNER JOIN Widget_States ws on ws.widget_state_id = w.widget_state_id
    LEFT JOIN Ownership o ON w.widget_id = o.widget_id
    LEFT JOIN People p ON o.person_id = p.person_id
    GROUP BY wt.widget_type_name, ws.widget_state_name /* , p.person_id */
    

    The problem I'm having is that SQL Server complains about p.person_id not being in an aggregate function or the GROUP BY list, but if I uncomment that field, I get all the unique people-widget combinations (unless one person owns more than one widget of the same type).

    I thought a little bit about switching to use a CTE instead of STUFF...FOR XML PATH, but I can't quite figure out how it would work. Can anyone help me with either solution? (Or any other method, I'm not picky, as long as it works. :D )

    Edit: I should also add that widgets can be unowned, but I still want to see the count of those in the query. The "List of People" should be null in such cases. I don't care about people who don't have any widgets.

    Edit edit: I should also update the query to reflect that. :P

    Editx3: Updated query to add info in response to discussion below.
    Editx4: Ditto.



  • SQL's right, putting p.person_id inside the STUFF statement means it has to be part of the aggregation via. its own rules.

    The FOR XML PATH hack is the right thing to do here, but I think you're structuring the query a bit wrong. Look at your description and we have two pretty distinct things:

    1. The count of Widgets, grouped by widget type. (BTW you didn't spell it out here, but I assume there's a 1:1 relationship between widget_type and widget_id?)
    2. A comma-separated list of names of people who own at least one of widget, grouped by widget

    I'd create the second data set elsewhere (table variable, temp table, even a subquery) and merely join it to the first. It'll be an easy join because they're both grouped by widget_type.


    EDIT: actually, thinking about this again, in the STUFF you're grabbing a person_id by equating it to the outer query's person_id, when you could instead get one from the outer query's widget_type. Maybe something like this:

    SELECT w.widget_type "Type"
         , COUNT(w.widget_id) "Quantity"
         , STUFF((SELECT DISTINCT ',' + p1.person_name
                  FROM Widgets w1
                      JOIN Ownership o1 on w1.widget_id = o1.widget_id
                      JOIN Person p1 on p1.person_id = o1.person_id
                  WHERE w1.widget_type = w.widget_type
                  FOR XML PATH('')), 1, 1, '') "List of People"
    FROM Widgets w
    INNER JOIN Ownership o ON w.widget_id = o.widget_id
    --INNER JOIN People p ON o.person_id = p.person_id (no longer needed; we get the ownership count from Ownership table
    GROUP BY w.widget_type /* , p.person_id */
    

    That should make SQL happy? Untested.

    EDIT: except you need a SELECT DISTINCT in the STUFF query, otherwise you'll get duplicated owner names.

    (Also add obligatory comment about how SQL keywords written in all-caps is both extremely ugly and completely unnecessary. Microsoft's style guide can go fuck itself.)



  • @djls45
    You can CTE out the GROUP BY to solve the problem with undesired row fan-out, I would expect that to optimize better than the scalar sub-query approach. I don't think there's a way to avoid the FOR XML PATH approach since you have multiple groupings of records you need to return.

    ;WITH cte_WidgetCounts AS (
        SELECT 
             w.widget_id
            ,w.widget_type AS "Type"
            ,COUNT(*) AS "Quantity"
        FROM Widgets w
        WHERE EXISTS ( SELECT 1 FROM Ownership WHERE widget_id = w.widget_id )
        GROUP BY w.widget_id, w.widget_type
    )
    SELECT
         wc.Type
        ,wc.Quantity
        ,STUFF((SELECT ',' + p.person_name
                FROM People p
                WHERE p.person_id IN ( SELECT person_id FROM ownership o WHERE o.widget_id = wc.widget_id )
                FOR XML PATH('')), 1, 1, '') AS "List of People"
    FROM cte_WidgetCounts wc
    


  • @blakeyrat said in SQL Aggregate with List:

    BTW you didn't spell it out here, but I assume there's a 1:1 relationship between widget_type and widget_id?

    That's a many-to-one relationship. Each widget_id has only one widget_type, but there are multiple of each type.



  • @djls45 I'm confused about your schema.

    For example, in this query, you're doing the Ownership table join based on ID, but your results list widget_type...

    Like, if I were designing a database, I'd have either:
    widget_name -> widget_id or
    widget_type -> widget_type_id

    I have to admit I don't quite understand what's going on here, but it seems you're mixing and matching the idea of a widget and a widget's type willy-nilly.



  • @blakeyrat Ah. To clarify, then, there is a separate Widget_Type table that has widget_type_id and widget_name columns, and Widget has a FK on widget_type_id. Since the join to Widget_Type is only to get widget_name, I figured I could simplify the query a bit by omitting that relationship. Widget also has various widget_state values that are included in the original query, and there's a number of other values from other table relationships, but those seemed to also be extraneous.



  • @djls45 Ok that makes more sense-- so your query doesn't output the FINAL data, it just outputs the type's ID and you're too lazy to join to the actual name. ;)

    EDIT: But next time, at least name the foreign key after the primary it links to-- if it had been called widget_type_id instead of widget_type this all would have been way less confusing.



  • @djls45
    Hm, yeah, I misinterpreted that too, my CTE method isn't going to work either, lol.

    I don't suppose this is on SQL Server 2017 👼



  • @djls45
    With the caveat that this is using an EVIL SCALAR UDF and thus may not scale up very well...

    CREATE FUNCTION dbo.udf_GetWidgetOwners( @widget_type VARCHAR(50) )
    RETURNS VARCHAR(MAX)
    AS
    BEGIN
      DECLARE @val VARCHAR(MAX)
      
      SELECT @val = COALESCE(@val + ',' + p.person_name, p.person_name) 
      FROM Widgets w
      INNER JOIN Ownership o
        ON w.widget_id = o.widget_id
      INNER JOIN People p
        ON o.person_id = p.person_id
      WHERE w.widget_type = @widget_type
      
      RETURN @val
    END
    GO
    
    SELECT
      w.widget_type AS Type
      ,COUNT(*) AS Quantity
      ,dbo.udf_GetWidgetOwners(w.widget_type) AS "List of People"
    FROM Widgets w
    GROUP BY w.widget_type
    


  • @izzion said in SQL Aggregate with List:

    I don't suppose this is on SQL Server 2017 👼

    No, I've got 2008, 2012, and 2016, though I mostly almost entirely use 2012.



  • @djls45
    Yeah, figured as much. SQL 2017 introduces STRING_AGG which seems to be kind of a more flexible, built in version of the COALESCE-in-a-UDF method I outlined. The COALESCE-in-a-UDF method will work on 2008+, but you should test with a more realistic schema than what I came up with in my simple exmaple; scalar UDFs are not known for their performance or scalability.



  • @izzion said in SQL Aggregate with List:

    With the caveat that this is using an EVIL SCALAR UDF and thus may not scale up very well...

    There's also the fact that I can't create database objects (like UDFs) due to policy. I'm restricted to DML statements.



  • @djls45
    Hm. Well, then, unless you can buy the DBA a good bottle of scotch, you're gonna be stuck with FOR XML PATH...

    Here's a fiddle that tests what is effectively blakeyrat's version... I changed the join in the outer query and the DISTINCT in the inner query around to more closely match your initial snippet. Comes down to whether you want to:

    • count widgets that are not owned as part of their type (blakey's use of INNER JOIN in the outer query would eliminate those rows from the count)
    • have duplicate names in the list of people column (add the DISTINCT to the subquery to remove the duplicates)


  • @djls45 Did you try my version of the query BTW? Did it work?



  • @blakeyrat said in SQL Aggregate with List:

    @djls45 Did you try my version of the query BTW? Did it work?

    Yes, I just did. There's actually several tables that I replaced with the single Ownership relationship table, so I had to figure out where exactly to split the queries between the outer query and the subquery in STUFF. But I once I got that figured out and noticed your note on using distinct in the subquery, it appears to work beautifully. Thank you very much! 🕺

    Now to see if I can tweak @izzion's CTE solution to get it working, too. :D



  • @izzion Yeah, it looks like you were right in that a CTE doesn't really solve the issue. :(

    Your fiddle doesn't seem to have anything in it.



  • @blakeyrat said in SQL Aggregate with List:

    (Also add obligatory comment about how SQL keywords written in all-caps is both extremely ugly and completely unnecessary. Microsoft's style guide can go fuck itself.)

    Oh, and I only use all-caps keywords for queries I show to other people (with Initial_Case for table names and all_lower for columns). In my own one-off queries, I just use all lowercase (except in a string literal if it has case-sensitive collation).



  • @djls45 said in SQL Aggregate with List:

    Your fiddle doesn't seem to have anything in it.

    Not just blank, but set to the wrong DBMS. (MySQL 5.something.)



  • @djls45
    Bleh, it must have changed the link when I expanded the query plan or something and invalidated the URL. I don’t use sqlfiddle very often, so I guess I failed it :/



  • I used to do this stuff all the time back in the day.

    But now, I'd just extract the results without users, then make another query to get all the user ids.

    KISS.


Log in to reply