Selecting distinct rows in Entity Framework

Choonster

This is a cross-post from StackOverflow, I thought I'd ask here as well in case anyone had any insight.

I'm trying to query an SQL Server view with Entity Framework and only return rows that are distinct based on multiple columns.

I've tried a solution based on this answer (GroupBy and then Select(g => g.FirstOrDefault()), but I'm still getting duplicate rows.

Table structure (this is a fairly complex view in the real database, but the final output is similar in structure to this example):

CREATE TABLE Example (
	ID_A BIGINT,
	ID_B BIGINT,
	ID_C BIGINT,

	Type_A NVARCHAR(50),
	Type_B NVARCHAR(50),

	ID_Z BIGINT,

	Foo NVARCHAR(200),
	Bar NVARCHAR(200)
)

Example data:

INSERT INTO Example (ID_A, ID_B, ID_C, Type_A, Type_B,  ID_Z,  Foo, Bar)
VALUES (1, 1, 1, 'TypeA1', 'TypeB1',  1,  'foo1', 'bar1'), -- This row and the next one represent the same main record (1) joined to different "Z" records (1 and 2)
       (1, 1, 1, 'TypeA1', 'TypeB1',  2,  'foo1', 'bar1'),
       (2, 1, 2, 'TypeA2', 'TypeA2',  1,  'foo2', 'bar2'), -- This row and the next two represent the same main record (2) joined to different "Z" records (1, 2 and 3)
       (2, 1, 2, 'TypeA2', 'TypeA2',  2,  'foo2', 'bar2'),
       (2, 1, 2, 'TypeA2', 'TypeA2',  3,  'foo2', 'bar2')

ID_A, ID_B, ID_C, Type_A and Type_B identify the "main" entity and ID_Z identifies the joined "Z" entity. Foo and Bar are non-unique data columns that need to be included in the final results.

For each combination of main ID/Type values, there can be multiple ID_Z values. I need to filter by the ID_Z values and then return the distinct main entity values (based on the ID/Type values).

I've tried a query like the following:

// The `ID_Z` values to filter on
var zIDs = new List<long> { 1, 2 };

var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Select(g => g.FirstOrDefault())
       .ToList();

But this seems to return all of the rows matching the Z_ID filter (resulting in duplicate "main" values) instead of only returning the first row for each set of "main" ID/Type values.

If I materialise (ToList) the query directly after GroupBy, I seem to get the correct groupings; but I'd like to run this all in the DB and avoid using LINQ to Objects queries.

How can I create this query?

Vixen

@Choonster said in Selecting distinct rows in Entity Framework:

var result = context.Set<ExampleEntity>()
.Where(e => zIDs.Contains(e.ID_Z))
.GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
.Select(g => g.FirstOrDefault())
.ToList();

I believe you want......

var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Select(g => g.FirstOrDefault())
       .Distinct()
       .ToList();

or

class MyComparer : IEqualityComparer<T>
{
      // Some Code here to implement the interface
}
var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Select(g => g.FirstOrDefault())
       .Distinct(new MyComparer())
       .ToList();

dotnet-bot

Enumerable.Distinct Method (System.Linq)

Returns distinct elements from a sequence.

Vixen

@Vixen said in Selecting distinct rows in Entity Framework:

A bunch of stuff....

Alternately now that I look at it...... you can probably get rid of the group by statement

var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .Select(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Distinct()
       .ToList();

That will work I think.... Havent bothered to actually set up linqpad to verify, but it looks like it'll work...... YMMV

Choonster

@Vixen Thanks for the replies, but I don't think they're quite what I'm looking for.

@Vixen said in Selecting distinct rows in Entity Framework:

@Choonster said in Selecting distinct rows in Entity Framework:

var result = context.Set<ExampleEntity>()
.Where(e => zIDs.Contains(e.ID_Z))
.GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
.Select(g => g.FirstOrDefault())
.ToList();

I believe you want......
var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Select(g => g.FirstOrDefault())
       .Distinct()
       .ToList();

I don't think this would work since Distinct() operates on all columns, but there would be multiple ID_Z values in each grouping.

class MyComparer : IEqualityComparer<T>
{
      // Some Code here to implement the interface
}
var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Select(g => g.FirstOrDefault())
       .Distinct(new MyComparer())
       .ToList();

dotnet-bot

Enumerable.Distinct Method (System.Linq)

Returns distinct elements from a sequence.

I'm pretty sure LINQ to Entities doesn't support that overload of Distinct(), since the query needs to be translated into SQL. I'm trying to avoid in-memory queries (LINQ to Objects).

Edit: This page explicitly mentions that methods using IEqualityComparer aren't supported in LINQ To Entities.

@Vixen said in Selecting distinct rows in Entity Framework:

@Vixen said in Selecting distinct rows in Entity Framework:

A bunch of stuff....

Alternately now that I look at it...... you can probably get rid of the group by statement
var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .Select(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Distinct()
       .ToList();
That will work I think.... Havent bothered to actually set up linqpad to verify, but it looks like it'll work...... YMMV

That would return a list of anonymous type instances rather than a list of ExampleEntity instances (which is what I'm trying to load from this query).

Vixen

@Choonster said in Selecting distinct rows in Entity Framework:

I'm trying to avoid in-memory queries

well, then my suggestions are:

Pull up your big girl panties and use in memory queries to filter out the distinct because this is going to be pretty damn near impossible without them
or
Write a SPROC in the database and then call it via EF to make your query, allowing you to get all database kungfu on the problem without trying to bend EF into making the SQL query you want
or
If possible change your DB structure. You have a lot of data duplication in your tables, and by desconstructing them into a more normal form will help your querying.

making the tables like this:

CREATE TABLE Example (
        ID BIGINT IDENTITY(1,1),
	ID_A BIGINT,
	ID_B BIGINT,
	ID_C BIGINT,

	Type_A NVARCHAR(50),
	Type_B NVARCHAR(50),

	Foo NVARCHAR(200),
	Bar NVARCHAR(200)
)

CREATE TABLE MAIN_TO_Z {
        MAIN_ID BIGINT REFERENCES Example(ID),
        Z_ID BIGINT REFERENCES ZTABLE(ID)
}

INSERT INTO Example (ID_A, ID_B, ID_C, Type_A, Type_B,  Foo, Bar)
VALUES (1, 1, 1, 'TypeA1', 'TypeB1',   'foo1', 'bar1'),
       (2, 1, 2, 'TypeA2', 'TypeA2',  'foo2', 'bar2'), records (1, 2 and 3)

INSERT INTO MAIN_TO_Z (MAIN_ID, Z_ID)
VALUES (1, 1), -- Okay I'm taking advantage of the fact i know the orders the IDs were handed out. Chill, it's a demo
        (1, 2),
        (2, 1),
        (2, 2),
        (2, 3),

then your query becomes:

var Ids = context.Set<MAIN_TO_Z>()
       .Where(e => zIDs.Contains(e.Z_ID))
      .Select(row => row.MAIN_ID)
      .Distinct() // create this external to prevent rerunning this. if using the more SQL form of linq you can inline it and the transformations will produce this when the sqlike syntax is transformed to method chains.... but this is already method chains so we have to do it ourselves
var result = context.Set<ExampleEntity>()
       .Where(e => Ids.Contains(e.ID))
       .ToList();

Choonster

@Vixen In the real DB, ExampleEntity and the Example table are actually a fairly complex view consisting of unions between multiple main entity tables joined to their related tables.

There are already separate tables for the main to Z joins, but only some of the entity types included in the view have these links and each entity type that does have the link has its own table for it.

The existing view didn't include these joins to Z, I've only added them today because there was a requirement to include all records with matching Z IDs. I thought it would be simple enough to add the joins and filter out the duplicate values with a distinct operation, I didn't realise it was going to be such a hassle.

The view's query looks a bit like this:

SELECT
   ...
FROM Entity_A
INNER JOIN Entity_A_Foo ON ...
INNER JOIN Entity_A_Z ON ...

UNION

SELECT
    ...
FROM Entity_B
INNER JOIN Entity_B_Bar ON ...
INNER JOIN Entity_B_Z ON ...

UNION

SELECT
    ...
FROM Entity_C
INNER JOIN Entity_C_Baz ON ...
-- No join to Z

I might have to write the query in SQL myself, though I haven't actually had any luck applying any of the advice for distinct queries I've found.

Vixen

@Choonster said in Selecting distinct rows in Entity Framework:

In the real DB, ExampleEntity and the Example table are actually a fairly complex view consisting of unions between multiple main entity tables joined to their related tables.

ugh.......

i've dealt with databases like that....

I HATES THEM.

so so so so hates them.

I shall have a beer at lunch in your honor for dealing with this.

djls45

@Choonster ~~Do you have to use the existing views, or can you write/construct your own views or queries?~~

Edit: Never mind. I just saw this:

I might have to write the query in SQL myself, though I haven't actually had any luck applying any of the advice for distinct queries I've found.

That seems like it might be your best bet.

Tsaukpaetra

Not being at my computer, is it possible to move the where portion into the firstordefault portion?

djls45

@Choonster said in Selecting distinct rows in Entity Framework:

I thought it would be simple enough to add the joins and filter out the duplicate values with a distinct operation

That should be that simple. If you don't include the Z IDs in the select clause, then the union should be automatically eliminating the duplicate rows. If it doesn't, then wrapping the whole query with a select distinct should eliminate them, too.

Without Z IDs:

SELECT
    ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar
FROM Entity_A
INNER JOIN Entity_A_Foo ON ...
INNER JOIN Entity_A_Z ON ...
WHERE ID_Z = @filter
...

UNION

SELECT 
    ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar
FROM Entity_B
INNER JOIN Entity_B_Bar ON ...    
INNER JOIN Entity_B_Z ON ...
WHERE ID_Z = @filter
...

UNION

SELECT
    ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar
FROM Entity_C
INNER JOIN Entity_C_Baz ON ...
WHERE ID_Z = @filter

or

Using wrapper:

SELECT DISTINCT
    ID_A, ID_B, ID_C, Type_A, Type_B, Foo, Bar
FROM (
    SELECT
        ID_A, ID_B, ID_C, Type_A, Type_B, ID_Z, Foo, Bar
    FROM Entity_A
    INNER JOIN Entity_A_Foo ON ...
    INNER JOIN Entity_A_Z ON ...
    ...

    UNION

    SELECT 
        ID_A, ID_B, ID_C, Type_A, Type_B, ID_Z, Foo, Bar
    FROM Entity_B
    INNER JOIN Entity_B_Bar ON ...    
    INNER JOIN Entity_B_Z ON ...
    ...

    UNION

    SELECT
        ID_A, ID_B, ID_C, Type_A, Type_B, ID_Z, Foo, Bar
    FROM Entity_C
    INNER JOIN Entity_C_Baz ON ...
) AS qry
WHERE ID_Z = @filter

Zecc

I'm not very experienced with EF. Could someone explain what's wrong with this?
It seems perfectly sensible to me.

@Choonster said in Selecting distinct rows in Entity Framework:

var result = context.Set<ExampleEntity>()
       .Where(e => zIDs.Contains(e.ID_Z))
       .GroupBy(e => new { e.ID_A, e.ID_B, e.ID_C, e.Type_A, e.Type_B })
       .Select(g => g.FirstOrDefault())
       .ToList();

Though I'd probably use .Select(g => g.Key) instead of .Select(g => g.FirstOrDefault()) if I didn't require getting ExampleEntitys back.

Groaner

@Vixen said in Selecting distinct rows in Entity Framework:

Write a SPROC in the database and then call it via EF to make your query, allowing you to get all database kungfu on the problem without trying to bend EF into making the SQL query you want

This is, in general, how I would approach any problem in EF that isn't straight CRUD.

izzion

@Groaner said in Selecting distinct rows in Entity Framework:

@Vixen said in Selecting distinct rows in Entity Framework:

Write a SPROC in the database and then call it via EF to make your query, allowing you to get all database kungfu on the problem without trying to bend EF into making the SQL query you want

This is, in general, how I would approach any problem in EF that isn't straight CRUD.

Which is all fun and games until your co-workers throw a party for you the first time you tune an EF query without turning into a sproc.

Choonster

Someone on StackOverflow found the problem: The entity class had been configured with ID_A as the primary key, so EF assumed that it was unique and ignored the GroupBy/Select(g => g.FirstOrDefault()) operation when generating the SQL.

The solution was to add a ROW_NUMBER() column to the view and tell EF to use that as the primary key. Once I did that, EF turned the GroupBy/Select(g => g.FirstOrDefault()) operation into a SELECT DISTINCT subquery and the duplicates were correctly filtered out.

Mason_Wheeler

@Choonster said in Selecting distinct rows in Entity Framework:

Someone on StackOverflow found the problem

Link please? If they helped out, they deserve credit.

Choonster

@Mason_Wheeler said in Selecting distinct rows in Entity Framework:

@Choonster said in Selecting distinct rows in Entity Framework:

Someone on StackOverflow found the problem

Link please? If they helped out, they deserve credit.

Selecting distinct rows in Entity Framework

I'm trying to query an SQL Server view with Entity Framework and only return rows that are distinct based on multiple columns. I've tried a solution based on this answer (GroupBy and then Select(g...

boomzilla

@Mason_Wheeler said in Selecting distinct rows in Entity Framework:

@Choonster said in Selecting distinct rows in Entity Framework:

Someone on StackOverflow found the problem

Link please? If they helped out, they deserve credit.

For being retarded enough to put an answer into a comment?

Zecc

@Choonster said in Selecting distinct rows in Entity Framework:

The entity class had been configured with ID_A as the primary key, so EF assumed that it was unique

So, is there a reason why it's configured to have a primary key which isn't unique?

Choonster

@Zecc said in Selecting distinct rows in Entity Framework:

@Choonster said in Selecting distinct rows in Entity Framework:

The entity class had been configured with ID_A as the primary key, so EF assumed that it was unique

So, is there a reason why it's configured to have a primary key which isn't unique?

EF requires every entity class to have a primary key, so I guess the original developer just chose the first column of the view (which was unique within each of the queries that form the view, but not unique overall). The view is only really used in a single place and the non-unique key hasn't been an issue until now (since it's a view rather than a table, there's no inserting/updating of records).

This codebase was originally developed by a Chinese outsourcing team, all but one of whom were fired several years ago (before my time).

TheCPUWizard

@Choonster said in Selecting distinct rows in Entity Framework:

since it's a view rather than a table, there's no inserting/updating of records

WilliamDAssafMSFT

Modify Data Through a View - SQL Server

Modify Data Through a View

Choonster

@TheCPUWizard said in Selecting distinct rows in Entity Framework:

@Choonster said in Selecting distinct rows in Entity Framework:

since it's a view rather than a table, there's no inserting/updating of records

WilliamDAssafMSFT

Modify Data Through a View - SQL Server

Modify Data Through a View

Thanks, I wasn't aware of that. Still, this view references multiple tables so there's no way to insert/update records through it.

dkf

@Choonster said in Selecting distinct rows in Entity Framework:

this view references multiple tables so there's no way to insert/update records through it.

A suitable TRIGGER could let it work, but that is non-trivial to define.

Tsaukpaetra

@boomzilla said in Selecting distinct rows in Entity Framework:

@Mason_Wheeler said in Selecting distinct rows in Entity Framework:

@Choonster said in Selecting distinct rows in Entity Framework:

Someone on StackOverflow found the problem

Link please? If they helped out, they deserve credit.

For being retarded enough to put an answer into a comment?

I'm going to try sniping credits (or whatever SO uses for points) by posting the answer from the comment. BRB.