SQL - Multiple counts

Karla

I'm responsible for a report the gets all of the demographic counts of our clients in a year.

I have to group the data in various ways.

There has to be better way than:

/* gender */
update d set total = (select count(ID) from #tempCust c where  c.group = d.group and gender = d.label)
from #demoCts 
where  category = 'Gender' 

update d set total = (select count(ID) from #tempCust c where  c.subgrouptype1= d.subgrouptype1 and gender = d.label)
from #demoCts 
where  category = 'Gender'

Repeat with other demographic data and subgroups.

#tempCust is a table basically , ID, Group, SubgroupType1, SubgroupType2, Gender, Hispanic, Disabled, etc
sample data:
ID,Group, SubgroupType1, SubgroupType2, Gender, Hispanic, Disabled.
1,Green,Apple,Car,Male,Yes,No
2,Yellow,Apple,Train,Female,No,No
3,Red,Kiwi,Car,Male,Yes,Yes
4,Green,Grape,Train,Female,No,Yes

#demoCts holds the totals for each demographic point for all, for groups, and for subgroups. (There is a second table for the other subgrouptype)
total,category_id,label,group,subgrouptype1
XX,Gender,Male,Green,Apple
XX,Gender,Male,Green,Kiwi
XX,Gender,Male,Green,Grape
XX,Gender,Male,Green,ALL
XX,Gender,Female,Green,Apple
XX,Gender,Female,Green,Kiwi
XX,Gender,Female,Green,Grape
XX,Gender,Female,Green,ALL

EDIT: Clarification, sample data

EDIT2: There are 20 or so categories of demographic data.

boomzilla

Have you tried:

update d set total = (select count(ID) from #tempCust c where  c.group = d.group and gender = d.Label)
from #demoCts 
where  category = 'Gender'

???

I do that sort of thing all the time in Oracle, although it would look more like update table_d d .... That is, I alias the updating table and use that alias inside the query.

Unperverted Vixen

Might be T-SQL specific.

WITH GroupedCounts AS (SELECT COUNT(*) AS Total, c.Group, c.Gender FROM #tempCust c GROUP BY c.Group, c.Gender)
UPDATE d
SET d.Total = c.Total
FROM #demoCts d
JOIN GroupedCounts c ON c.Group = d.Group and c.Subgroup = d.Gender

blakeyrat

Not sure this is what you need, but something like:


update #demoCts
set genderMale = count(*) from genderMale,
    genderFemale = count(*) from genderFemale
    (etc)
from #demoCts d
join #tempCust genderMale on 1=1 and genderMale.gender = 'male'
join #tempCust genderFemale on 1=1 and genderFemale.gender = 'female'
(etc)

I didn't run it but it should work fine.

As a note, questions like this would be a lot easier to answer if you could give us a few rows of sample data in both the temp tables, because I'm confused about what the original example is doing.

Karla

@boomzilla said in SQL - Multiple counts:

Have you tried:
update d set total = (select count(ID) from #tempCust c where  c.group = d.group and gender = d.Label)
from #demoCts 
where  category = 'Gender'
???

I do that sort of thing all the time in Oracle, although it would look more like update table_d d .... That is, I alias the updating table and use that alias inside the query.

I'm doing that. Screw up on the attempting to anon. It is the 50 repetitive statements i'm trying to avoid.

Tsaukpaetra

@karla said in SQL - Multiple counts:

There has to be better way than:

I would consider something like:

insert into #demoCtsResultInterrim values (
select group, gender, SUM(CASE WHEN c.group = d.group and gender = d.gender THEN 1 ELSE 0 END) total
FROM #tempCust c
GROUP BY group, gender
)

And I probably botched everything because I'm not reading comprehensively...

Karla

@unperverted-vixen said in SQL - Multiple counts:

Might be T-SQL specific.

WITH GroupedCounts AS (SELECT COUNT(*) AS Total, c.Group, c.Gender FROM #tempCust c GROUP BY c.Group, c.Gender)
UPDATE d
SET d.Total = c.Total
FROM #demoCts d
JOIN GroupedCounts c ON c.Group = d.Group and c.Subgroup = d.Gender

Subgroups are completely unrelated to demographic categories. I will need counts of all genders for each group and each subgroup.

blakeyrat

So it's a full combinatoric? You need a count for every possible combination of the source data?

You could put every column of the source data into a string, then count distinct strings. (Pick a string delimiter that won't conflict with the source data.)

Groaner

@karla This almost sounds like a job for WITH CUBE or WITH ROLLUP.

masonwheeler

@groaner said in SQL - Multiple counts:

WITH ROLLUP

Gribnit

@masonwheeler That's a cheerful image for rollup, I was picturing a cheap rented office being stripped to the walls and some people going for a long swim with carpets.

boomzilla

@karla said in SQL - Multiple counts:

@boomzilla said in SQL - Multiple counts:
Have you tried:
update d set total = (select count(ID) from #tempCust c where  c.group = d.group and gender = d.Label)
from #demoCts 
where  category = 'Gender'
???

I do that sort of thing all the time in Oracle, although it would look more like update table_d d .... That is, I alias the updating table and use that alias inside the query.
I'm doing that. Screw up on the attempting to anon. It is the 50 repetitive statements i'm trying to avoid.

You're not doing that in your statement. You're repeating the statements for each gender instead of letting the query do it for you. Here, I'll reformat and maybe it will be clearer:

update d set total = (
  select count(ID)
  from #tempCust c 
  where  c.group = d.group 
    and gender = d.Label)
from #demoCts 
where  category = 'Gender'

Note this bit in particular: and gender = d.Label. Which looking at it again, you must be able to do since you were already doing it with the group columns.

Karla

@blakeyrat said in SQL - Multiple counts:

So it's a full combinatoric? You need a count for every possible combination of the source data?

Ultimately, I only need overall totals but in order to validate the numbers, from other reports I need to see the groups/subgroups numbers.

You could put every column of the source data into a string, then count distinct strings. (Pick a string delimiter that won't conflict with the source data.)

Trying to wrap my head around this.

blakeyrat

@karla Well I'm frankly still unclear what exactly the requirements are (sorry maybe I'm dense), but I was thinking something like this:


select d.*, count( d.Group + '|||' + SubgroupType1 + '|||' + SubgroupType2 + '|||' + Gender + '|||' + Hispanic + '|||' + Disabled ) as 'count'
group by d.Group + '|||' + SubgroupType1 + '|||' + SubgroupType2 + '|||' + Gender + '|||' + Hispanic + '|||' + Disabled

That gives you a count of each combination of columns you listed in your string.

(Again I haven't run this myself, syntax may be wrong. My T-SQL is rusty as shit. Etc.)

Karla

@blakeyrat said in SQL - Multiple counts:

@karla Well I'm frankly still unclear what exactly the requirements are (sorry maybe I'm dense), but I was thinking something like this:
select d.*, count( distinct( d.Group + '|||' + SubgroupType1 + '|||' + SubgroupType2 + '|||' + Gender + '|||' + Hispanic + '|||' + Disabled ))
group by d.*
That gives you a count of each combination of columns you listed in your string.

(Again I haven't run this myself, syntax may be wrong.)

OK, now I see what you are saying.

I need counts for how many Males, Females, Disabled, Not Disabled, Hispanic, Not Hispanic, etc (individually).

Groups and subgroups to help validate the numbers. As we have no other report that goes across all groups.

Karla

@boomzilla said in SQL - Multiple counts:

@karla said in SQL - Multiple counts:
@boomzilla said in SQL - Multiple counts:
Have you tried:
update d set total = (select count(ID) from #tempCust c where  c.group = d.group and gender = d.Label)
from #demoCts 
where  category = 'Gender'
???

I do that sort of thing all the time in Oracle, although it would look more like update table_d d .... That is, I alias the updating table and use that alias inside the query.
I'm doing that. Screw up on the attempting to anon. It is the 50 repetitive statements i'm trying to avoid.
You're not doing that in your statement. You're repeating the statements for each gender instead of letting the query do it for you. Here, I'll reformat and maybe it will be clearer:
update d set total = (
  select count(ID)
  from #tempCust c 
  where  c.group = d.group 
    and gender = d.Label)
from #demoCts 
where  category = 'Gender'
Note this bit in particular: and gender = d.Label. Which looking at it again, you must be able to do since you were already doing it with the group columns.

I realized that and corrected. But I need like 20 different demographic counts (gender being only one).

Karla

@groaner said in SQL - Multiple counts:

WITH CUBE or WITH ROLLUP

These will help.

boomzilla

@karla said in SQL - Multiple counts:

@blakeyrat said in SQL - Multiple counts:
@karla Well I'm frankly still unclear what exactly the requirements are (sorry maybe I'm dense), but I was thinking something like this:
select d.*, count( distinct( d.Group + '|||' + SubgroupType1 + '|||' + SubgroupType2 + '|||' + Gender + '|||' + Hispanic + '|||' + Disabled ))
group by d.*
That gives you a count of each combination of columns you listed in your string.

(Again I haven't run this myself, syntax may be wrong.)
OK, now I see what you are saying.

I need counts for how many Males, Females, Disabled, Not Disabled, Hispanic, Not Hispanic, etc (individually).

Groups and subgroups to help validate the numbers. As we have no other report that goes across all groups.

Ah. I get it now. I read the code but not so much the words. Well, with this approach you only need one per category, at least. If you have a mapping of categories to columns you might be able to generate the list of queries automatically. Because I think it's the repetition that you're fighting, not having to run multiple queries which you could batch up into a single run.

blakeyrat

@karla said in SQL - Multiple counts:

I need counts for how many Males, Females, Disabled, Not Disabled, Hispanic, Not Hispanic, etc.

So you need a count of:

Group + Subgroup + Male
Group + Subgroup + Female
Group + Subgroup + Disabled
etc?

Or phrased more generally, a count of each:

Group + Subgroup + (Each option in each column other than ID, Group, Subgroup)?

In T-SQL there's a PIVOT operator that turns columns into rows. You could probably automatically generate the desired table using that functionality.

Karla

@boomzilla said in SQL - Multiple counts:

@karla said in SQL - Multiple counts:
@blakeyrat said in SQL - Multiple counts:
@karla Well I'm frankly still unclear what exactly the requirements are (sorry maybe I'm dense), but I was thinking something like this:
select d.*, count( distinct( d.Group + '|||' + SubgroupType1 + '|||' + SubgroupType2 + '|||' + Gender + '|||' + Hispanic + '|||' + Disabled ))
group by d.*
That gives you a count of each combination of columns you listed in your string.

(Again I haven't run this myself, syntax may be wrong.)
OK, now I see what you are saying.

I need counts for how many Males, Females, Disabled, Not Disabled, Hispanic, Not Hispanic, etc (individually).

Groups and subgroups to help validate the numbers. As we have no other report that goes across all groups.
Ah. I get it now. I read the code but not so much the words. Well, with this approach you only need one per category, at least.

Plus the WITH ROLLUP

If you have a mapping of categories to columns you might be able to generate the list of queries automatically. Because I think it's the repetition that you're fighting, not having to run multiple queries which you could batch up into a single run.

Yeah, because obviously that is mistake prone and this report is hard to validate and now that our BA is retiring...I need to be very careful.

Generation was what I was thinking before I posted the thread. I was hoping I was missing something else to make it a easier.

Karla

@blakeyrat said in SQL - Multiple counts:

PIVOT

Yeah, I think this may do it (or at least do much of it).

blakeyrat

I have to admit I made a scratch table for this and I'm having a lot of trouble getting it to work without using "reflection" (aka. information_schema table queries) to yank out the column names. It's trickier than it seems.

blakeyrat

@blakeyrat

Scratch table:


/****** Object:  Table [dbo].[scratch]    Script Date: 7/18/2018 2:40:10 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[scratch](
	[id] [bigint] IDENTITY(1,1) NOT NULL,
	[group] [nvarchar](50) NOT NULL,
	[subgroup] [nvarchar](50) NOT NULL,
	[gender] [nvarchar](50) NOT NULL,
	[disabled] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_scratch] PRIMARY KEY CLUSTERED 
(
	[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[scratch] ON 
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (1, N'animal', N'human', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (2, N'animal', N'human', N'female', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (3, N'animal', N'dog', N'female', N'yes')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (4, N'mineral', N'graphite', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (5, N'mineral', N'diamond', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (6, N'animal', N'human', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (7, N'animal', N'dog', N'female', N'yes')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (8, N'animal', N'dog', N'female', N'yes')
GO
SET IDENTITY_INSERT [dbo].[scratch] OFF
GO

Query:



select s.[group], s.subgroup, s.colname, s.colval, count(*) as 'total'
from scratch s
unpivot
(
    colval
	for colname in ( gender, [disabled] )
) s
group by s.[group], s.subgroup, s.colname, s.colval

Output of query:


group	subgroup	colname	        colval	total
animal	dog	        disabled	yes	3
animal	dog	        gender	        female	3
animal	human	        disabled	no	3
animal	human	        gender	        female	1
animal	human	        gender	        male	2
mineral	diamond	        disabled	no	1
mineral	diamond	        gender	        male	1
mineral	graphite	disabled	no	1
mineral	graphite	gender	        male	1

Turns out PIVOT wasn't the answer, but UNPIVOT was. You still need to list the names of your columns if the query (although if you wanted to get really fancy you could grab those from a dynamic SQL statement).

Karla

@blakeyrat said in SQL - Multiple counts:

@blakeyrat

Scratch table:


/****** Object:  Table [dbo].[scratch]    Script Date: 7/18/2018 2:40:10 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[scratch](
	[id] [bigint] IDENTITY(1,1) NOT NULL,
	[group] [nvarchar](50) NOT NULL,
	[subgroup] [nvarchar](50) NOT NULL,
	[gender] [nvarchar](50) NOT NULL,
	[disabled] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_scratch] PRIMARY KEY CLUSTERED 
(
	[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[scratch] ON 
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (1, N'animal', N'human', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (2, N'animal', N'human', N'female', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (3, N'animal', N'dog', N'female', N'yes')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (4, N'mineral', N'graphite', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (5, N'mineral', N'diamond', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (6, N'animal', N'human', N'male', N'no')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (7, N'animal', N'dog', N'female', N'yes')
GO
INSERT [dbo].[scratch] ([id], [group], [subgroup], [gender], [disabled]) VALUES (8, N'animal', N'dog', N'female', N'yes')
GO
SET IDENTITY_INSERT [dbo].[scratch] OFF
GO

Query:



select s.[group], s.subgroup, s.colname, s.colval, count(*) as 'total'
from scratch s
unpivot
(
    colval
	for colname in ( gender, [disabled] )
) s
group by s.[group], s.subgroup, s.colname, s.colval

Output of query:


group	subgroup	colname	        colval	total
animal	dog	        disabled	yes	3
animal	dog	        gender	        female	3
animal	human	        disabled	no	3
animal	human	        gender	        female	1
animal	human	        gender	        male	2
mineral	diamond	        disabled	no	1
mineral	diamond	        gender	        male	1
mineral	graphite	disabled	no	1
mineral	graphite	gender	        male	1

Turns out PIVOT wasn't the answer, but UNPIVOT was. You still need to list the names of your columns if the query (although if you wanted to get really fancy you could grab those from a dynamic SQL statement).

This helps because I was having trouble with it.

I'll probably have a better idea tomorrow after having slept on it.

blakeyrat

@karla One interesting quirk about this query is it breaks group by all functionality in SQL Server. And you can't unpivot by all apparently.

If a particular combination has a total of zero, you just can't get it in the output. AFAICT. Hopefully that's not a problem for you.

Karla

@blakeyrat said in SQL - Multiple counts:

@karla One interesting quirk about this query is it breaks group by all functionality in SQL Server. And you can't unpivot by all apparently.

If a particular combination has a total of zero, you just can't get it in the output. AFAICT. Hopefully that's not a problem for you.

Never used "group by all" and since I'm doing an update rather than insert zeros shouldn't be a problem.

ben_lubar

If your database software supports COUNT(*) OVER (PARTITION BY ...), you can do this without the subqueries.

Here's the PostgreSQL documentation:

May 9

3.5. Window Functions

3.5. Window Functions # A window function performs a calculation across a set of table rows that are somehow related to …

I'm pretty sure it's also supported in Oracle and MSSQL.

lolwhat

@ben_lubar said in SQL - Multiple counts:

I'm pretty sure it's also supported in ... MSSQL.

Can confirm.

PleegWat

@ben_lubar Oracle has window functions as well. It also has unpivot which is probably a better fit here.