Selecting distinct rows in Entity Framework
-
This is a cross-post from StackOverflow, I thought I'd ask here as well in case anyone had any insight.
I'm trying to query an SQL Server view with Entity Framework and only return rows that are distinct based on multiple columns.
I've tried a solution based on this answer (
GroupBy
and thenSelect(g => g.FirstOrDefault())
, but I'm still getting duplicate rows.Table structure (this is a fairly complex view in the real database, but the final output is similar in structure to this example):
CREATE TABLE Example ( ID_A BIGINT, ID_B BIGINT, ID_C BIGINT, Type_A NVARCHAR(50), Type_B NVARCHAR(50), ID_Z BIGINT, Foo NVARCHAR(200), Bar NVARCHAR(200) )
Example data:
INSERT INTO Example (ID_A, ID_B, ID_C, Type_A, Type_B, ID_Z, Foo, Bar) VALUES (1, 1, 1, 'TypeA1', 'TypeB1', 1, 'foo1', 'bar1'), -- This row and the next one represent the same main record (1) joined to different "Z" records (1 and 2) (1, 1, 1, 'TypeA1', 'TypeB1', 2, 'foo1', 'bar1'), (2, 1, 2, 'TypeA2', 'TypeA2', 1, 'foo2', 'bar2'), -- This row and the next two represent the same main record (2) joined to different "Z" records (1, 2 and 3) (2, 1, 2, 'TypeA2', 'TypeA2', 2, 'foo2', 'bar2'), (2, 1, 2, 'TypeA2', 'TypeA2', 3, 'foo2', 'bar2')
ID_A
,ID_B
,ID_C
,Type_A
andType_B
identify the "main" entity andID_Z
identifies the joined "Z" entity.Foo
andBar
are non-unique data columns that need to be included in the final results.For each combination of main ID/Type values, there can be multiple
ID_Z
values. I need to filter by theID_Z
values and then return the distinct main entity values (based on the ID/Type values).I've tried a query like the following:
// The `ID_Z` values to filter on var zIDs = new List<long> { 1, 2 }; var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Select(g => g.FirstOrDefault()) .ToList();
But this seems to return all of the rows matching the
Z_ID
filter (resulting in duplicate "main" values) instead of only returning the first row for each set of "main" ID/Type values.If I materialise (
ToList
) the query directly afterGroupBy
, I seem to get the correct groupings; but I'd like to run this all in the DB and avoid using LINQ to Objects queries.How can I create this query?
-
@Choonster said in Selecting distinct rows in Entity Framework:
var result = context.Set<ExampleEntity>()
.Where(e => zIDs.Contains(e.ID_Z))
.GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
.Select(g => g.FirstOrDefault())
.ToList();I believe you want......
var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Select(g => g.FirstOrDefault()) .Distinct() .ToList();
or
class MyComparer : IEqualityComparer<T> { // Some Code here to implement the interface } var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Select(g => g.FirstOrDefault()) .Distinct(new MyComparer()) .ToList();
-
@Vixen said in Selecting distinct rows in Entity Framework:
A bunch of stuff....
Alternately now that I look at it...... you can probably get rid of the group by statement
var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .Select(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Distinct() .ToList();
That will work I think.... Havent bothered to actually set up linqpad to verify, but it looks like it'll work...... YMMV
-
@Vixen Thanks for the replies, but I don't think they're quite what I'm looking for.
@Vixen said in Selecting distinct rows in Entity Framework:
@Choonster said in Selecting distinct rows in Entity Framework:
var result = context.Set<ExampleEntity>()
.Where(e => zIDs.Contains(e.ID_Z))
.GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
.Select(g => g.FirstOrDefault())
.ToList();I believe you want......
var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Select(g => g.FirstOrDefault()) .Distinct() .ToList();
I don't think this would work since
Distinct()
operates on all columns, but there would be multipleID_Z
values in each grouping.class MyComparer : IEqualityComparer<T> { // Some Code here to implement the interface } var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Select(g => g.FirstOrDefault()) .Distinct(new MyComparer()) .ToList();
I'm pretty sure LINQ to Entities doesn't support that overload of
Distinct()
, since the query needs to be translated into SQL. I'm trying to avoid in-memory queries (LINQ to Objects).Edit: This page explicitly mentions that methods using
IEqualityComparer
aren't supported in LINQ To Entities.@Vixen said in Selecting distinct rows in Entity Framework:
@Vixen said in Selecting distinct rows in Entity Framework:
A bunch of stuff....
Alternately now that I look at it...... you can probably get rid of the group by statement
var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .Select(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Distinct() .ToList();
That will work I think.... Havent bothered to actually set up linqpad to verify, but it looks like it'll work...... YMMV
That would return a list of anonymous type instances rather than a list of
ExampleEntity
instances (which is what I'm trying to load from this query).
-
@Choonster said in Selecting distinct rows in Entity Framework:
I'm trying to avoid in-memory queries
well, then my suggestions are:
- Pull up your big girl panties and use in memory queries to filter out the distinct because this is going to be pretty damn near impossible without them
or - Write a SPROC in the database and then call it via EF to make your query, allowing you to get all database kungfu on the problem without trying to bend EF into making the SQL query you want
or - If possible change your DB structure. You have a lot of data duplication in your tables, and by desconstructing them into a more normal form will help your querying.
making the tables like this:
CREATE TABLE Example ( ID BIGINT IDENTITY(1,1), ID_A BIGINT, ID_B BIGINT, ID_C BIGINT, Type_A NVARCHAR(50), Type_B NVARCHAR(50), Foo NVARCHAR(200), Bar NVARCHAR(200) ) CREATE TABLE MAIN_TO_Z { MAIN_ID BIGINT REFERENCES Example(ID), Z_ID BIGINT REFERENCES ZTABLE(ID) } INSERT INTO Example (ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar) VALUES (1, 1, 1, 'TypeA1', 'TypeB1', 'foo1', 'bar1'), (2, 1, 2, 'TypeA2', 'TypeA2', 'foo2', 'bar2'), records (1, 2 and 3) INSERT INTO MAIN_TO_Z (MAIN_ID, Z_ID) VALUES (1, 1), -- Okay I'm taking advantage of the fact i know the orders the IDs were handed out. Chill, it's a demo (1, 2), (2, 1), (2, 2), (2, 3),
then your query becomes:
var Ids = context.Set<MAIN_TO_Z>() .Where(e => zIDs.Contains(e.Z_ID)) .Select(row => row.MAIN_ID) .Distinct() // create this external to prevent rerunning this. if using the more SQL form of linq you can inline it and the transformations will produce this when the sqlike syntax is transformed to method chains.... but this is already method chains so we have to do it ourselves var result = context.Set<ExampleEntity>() .Where(e => Ids.Contains(e.ID)) .ToList();
- Pull up your big girl panties and use in memory queries to filter out the distinct because this is going to be pretty damn near impossible without them
-
@Vixen In the real DB,
ExampleEntity
and theExample
table are actually a fairly complex view consisting of unions between multiple main entity tables joined to their related tables.There are already separate tables for the main to Z joins, but only some of the entity types included in the view have these links and each entity type that does have the link has its own table for it.
The existing view didn't include these joins to Z, I've only added them today because there was a requirement to include all records with matching Z IDs. I thought it would be simple enough to add the joins and filter out the duplicate values with a distinct operation, I didn't realise it was going to be such a hassle.
The view's query looks a bit like this:
SELECT ... FROM Entity_A INNER JOIN Entity_A_Foo ON ... INNER JOIN Entity_A_Z ON ... UNION SELECT ... FROM Entity_B INNER JOIN Entity_B_Bar ON ... INNER JOIN Entity_B_Z ON ... UNION SELECT ... FROM Entity_C INNER JOIN Entity_C_Baz ON ... -- No join to Z
I might have to write the query in SQL myself, though I haven't actually had any luck applying any of the advice for distinct queries I've found.
-
@Choonster said in Selecting distinct rows in Entity Framework:
In the real DB, ExampleEntity and the Example table are actually a fairly complex view consisting of unions between multiple main entity tables joined to their related tables.
ugh.......
i've dealt with databases like that....
I HATES THEM.
so so so so hates them.
I shall have a beer at lunch in your honor for dealing with this.
-
@Choonster
Do you have to use the existing views, or can you write/construct your own views or queries?Edit: Never mind. I just saw this:
I might have to write the query in SQL myself, though I haven't actually had any luck applying any of the advice for distinct queries I've found.
That seems like it might be your best bet.
-
Not being at my computer, is it possible to move the where portion into the firstordefault portion?
-
@Choonster said in Selecting distinct rows in Entity Framework:
I thought it would be simple enough to add the joins and filter out the duplicate values with a distinct operation
That should be that simple. If you don't include the Z IDs in the
select
clause, then theunion
should be automatically eliminating the duplicate rows. If it doesn't, then wrapping the whole query with aselect distinct
should eliminate them, too.Without Z IDs:
SELECT ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar FROM Entity_A INNER JOIN Entity_A_Foo ON ... INNER JOIN Entity_A_Z ON ... WHERE ID_Z = @filter ... UNION SELECT ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar FROM Entity_B INNER JOIN Entity_B_Bar ON ... INNER JOIN Entity_B_Z ON ... WHERE ID_Z = @filter ... UNION SELECT ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar FROM Entity_C INNER JOIN Entity_C_Baz ON ... WHERE ID_Z = @filter
or
Using wrapper:
SELECT DISTINCT ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar FROM ( SELECT ID_A, ID_B, ID_C, Type_A, Type_B, ID_Z, Foo, Bar FROM Entity_A INNER JOIN Entity_A_Foo ON ... INNER JOIN Entity_A_Z ON ... ... UNION SELECT ID_A, ID_B, ID_C, Type_A, Type_B, ID_Z, Foo, Bar FROM Entity_B INNER JOIN Entity_B_Bar ON ... INNER JOIN Entity_B_Z ON ... ... UNION SELECT ID_A, ID_B, ID_C, Type_A, Type_B, ID_Z, Foo, Bar FROM Entity_C INNER JOIN Entity_C_Baz ON ... ) AS qry WHERE ID_Z = @filter
-
I'm not very experienced with EF. Could someone explain what's wrong with this?
It seems perfectly sensible to me.@Choonster said in Selecting distinct rows in Entity Framework:
var result = context.Set<ExampleEntity>() .Where(e => zIDs.Contains(e.ID_Z)) .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B }) .Select(g => g.FirstOrDefault()) .ToList();
Though I'd probably use
.Select(g => g.Key)
instead of.Select(g => g.FirstOrDefault())
if I didn't require gettingExampleEntity
s back.
-
@Vixen said in Selecting distinct rows in Entity Framework:
- Write a SPROC in the database and then call it via EF to make your query, allowing you to get all database kungfu on the problem without trying to bend EF into making the SQL query you want
This is, in general, how I would approach any problem in EF that isn't straight CRUD.
-
@Groaner said in Selecting distinct rows in Entity Framework:
@Vixen said in Selecting distinct rows in Entity Framework:
- Write a SPROC in the database and then call it via EF to make your query, allowing you to get all database kungfu on the problem without trying to bend EF into making the SQL query you want
This is, in general, how I would approach any problem in EF that isn't straight CRUD.
Which is all fun and games until your co-workers throw a party for you the first time you tune an EF query without turning into a sproc.
-
Someone on StackOverflow found the problem: The entity class had been configured with
ID_A
as the primary key, so EF assumed that it was unique and ignored theGroupBy
/Select(g => g.FirstOrDefault())
operation when generating the SQL.The solution was to add a
ROW_NUMBER()
column to the view and tell EF to use that as the primary key. Once I did that, EF turned theGroupBy
/Select(g => g.FirstOrDefault())
operation into aSELECT DISTINCT
subquery and the duplicates were correctly filtered out.
-
@Choonster said in Selecting distinct rows in Entity Framework:
Someone on StackOverflow found the problem
Link please? If they helped out, they deserve credit.
-
@Mason_Wheeler said in Selecting distinct rows in Entity Framework:
@Choonster said in Selecting distinct rows in Entity Framework:
Someone on StackOverflow found the problem
Link please? If they helped out, they deserve credit.
-
@Mason_Wheeler said in Selecting distinct rows in Entity Framework:
@Choonster said in Selecting distinct rows in Entity Framework:
Someone on StackOverflow found the problem
Link please? If they helped out, they deserve credit.
For being retarded enough to put an answer into a comment?
-
@Choonster said in Selecting distinct rows in Entity Framework:
The entity class had been configured with ID_A as the primary key, so EF assumed that it was unique
So, is there a reason why it's configured to have a primary key which isn't unique?
-
@Zecc said in Selecting distinct rows in Entity Framework:
@Choonster said in Selecting distinct rows in Entity Framework:
The entity class had been configured with ID_A as the primary key, so EF assumed that it was unique
So, is there a reason why it's configured to have a primary key which isn't unique?
EF requires every entity class to have a primary key, so I guess the original developer just chose the first column of the view (which was unique within each of the queries that form the view, but not unique overall). The view is only really used in a single place and the non-unique key hasn't been an issue until now (since it's a view rather than a table, there's no inserting/updating of records).
This codebase was originally developed by a Chinese outsourcing team, all but one of whom were fired several years ago (before my time).
-
@Choonster said in Selecting distinct rows in Entity Framework:
since it's a view rather than a table, there's no inserting/updating of records
-
@TheCPUWizard said in Selecting distinct rows in Entity Framework:
@Choonster said in Selecting distinct rows in Entity Framework:
since it's a view rather than a table, there's no inserting/updating of records
Thanks, I wasn't aware of that. Still, this view references multiple tables so there's no way to insert/update records through it.
-
@Choonster said in Selecting distinct rows in Entity Framework:
this view references multiple tables so there's no way to insert/update records through it.
A suitable TRIGGER could let it work, but that is non-trivial to define.
-
@boomzilla said in Selecting distinct rows in Entity Framework:
@Mason_Wheeler said in Selecting distinct rows in Entity Framework:
@Choonster said in Selecting distinct rows in Entity Framework:
Someone on StackOverflow found the problem
Link please? If they helped out, they deserve credit.
For being retarded enough to put an answer into a comment?
I'm going to try sniping credits (or whatever SO uses for points) by posting the answer from the comment. BRB.