Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).

Benjamin Hall

Note: https://meta.wikimedia.org/wiki/Cunningham's_Law is in effect for the title of this post.

I have two objects: SettlementRole and SettlementType. There is a many to many relationship between the two, where each SettlmentRole has a collection of SettlementTypes, each with a weight specific to that pair, but each SettlementType may be in a number of SettlementRoles, each with different weights.

I'm not using an ORM for various reasons, and ORMs are not in scope here.

Question: Is there a cleaner, simpler way of doing the same process than the following? This is mostly a learning project and there are a bunch of other similar (but not identical) entity pairs/groups in the project, so I'd like to get one as clean as possible so I can follow the same process for the more complex ones.

The function to get all SettlementRoles looks like (C#)

public async Task<IList<SettlementRole>> GetAllRolesAsync()
        {
            using var connection = new SqliteConnection(dbName);
            connection.Open();
            var command = connection.CreateCommand();
            command.CommandText = @"
                SELECT sr.id, sr.name, sr.description, t.id, t.name, t.description, t.minSize, t.maxSize, map.weight 
                FROM settlementRole sr
                JOIN settlementTypeRoleWeight map
                    ON map.roleId = sr.id
                JOIN settlementType t
                    ON t.id = map.typeId
                ;";
            var rawRoles = await command.ExecuteReaderAsync();
            var roles = new List<SettlementRole>();
            var map = new Dictionary<int, SettlementRole>();
            while (rawRoles.Read())
            {
                var id = rawRoles.GetInt32(0);
                var exists = map.TryGetValue(id, out var role);
                if (!exists)
                {
                    role = new SettlementRole() { Id = id, Name = rawRoles.GetString(1), Description = rawRoles.GetString(2) };
                    map.Add(id, role);
                }
                role!.AddType(new SettlementType()
                {
                    Name = rawRoles.GetString(3),
                    Description = rawRoles.GetString(4),
                    MinSize = rawRoles.GetInt32(5),
                    MaxSize = rawRoles.GetInt32(6),
                }, rawRoles.GetDouble(7));
            }
            return new List<SettlementRole>(map.Values);

Where settlementRole.AddType(SettlementType type, double weight) handles the internal weighting process of adding a type to the (default empty) collection.

and the schema for the three (including the join table) looks like

CREATE TABLE settlementRole(
id INTEGER NOT NULL PRIMARY KEY ASC AUTOINCREMENT,
name TEXT NOT NULL,
description TEXT NOT NULL DEFAULT ""
) ;
CREATE TABLE settlementType(
id INTEGER NOT NULL PRIMARY KEY ASC AUTOINCREMENT,
name TEXT NOT NULL,
description TEXT NOT NULL,
minSize INTEGER NOT NULL DEFAULT 1,
maxSize INTEGER,
professions TEXT NOT NULL DEFAULT ""
);
CREATE TABLE settlementTypeRoleWeight(
roleId INTEGER NOT NULL,
typeId INTEGER NOT NULL,
weight INTEGER NOT NULL DEFAULT 1,
PRIMARY KEY (roleId, typeId)
);

And the entities themselves (right now) are simple data objects (they'll have behavior, but that's out of scope here).

Gribnit

@Benjamin-Hall meh, typing looks okay, I'd have wanted clearer assurance on disposals but maybe using attaches on the enclosing block, you can retrieve the entire graph in one swat by abusing UNION ALL, and int typing the map key probably means unnecessary boxery. It's not awful. Your table names are going to hurt if the engine isn't case sensitive and ANSI iirc isn't. You could probably stand to be declaring more of your DB metamodel even without any JPA. But, if I didn't know better I'd write something like this - it's a decent start, and the rest'll become apparent as it starts to smell.

Oh, and almost certainly you can be making the query results unmarshalling a functional comprehension vs an iteration. And I'd strongly recommend not putting business logic in the entities directly, unless you're okay with fundamentally incomplete logic in the service layer OR the entity methods will strictly and only express real general invariants of the class. Even then it is usually less duplicative to express these invariants as interface methods, since they are often also cross-cutting over a subset of common fields.

ed. fart booby

Unperverted Vixen

@Benjamin-Hall Seems generally fine.

GetFieldValue<T> or its async counterpart are preferred over GetString, GetInt32, etc. The latter are only there for backwards compatibility, and they aren’t implementing individual methods for any new supported datatypes.

dotnet-bot

SqlDataReader.GetFieldValue<T>(Int32) Method (Microsoft.Data.SqlClient)

Synchronously gets the value of the specified column as a type. GetFieldValueAsync<T>(Int32, CancellationToken) is the asynchronous version of this method.

I prefer to use GetOrdinal instead of hardcoding column numbers.

Kamil Podlesak

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

I have two objects: SettlementRole and SettlementType. There is a many to many relationship between the two, where each SettlmentRole has a collection of SettlementTypes, each with a weight specific to that pair, but each SettlementType may be in a number of SettlementRoles, each with different weights.

Crucial question: is it guaranteed that there is always at least one SettlementType (with a weight) for every SettlementRole? Because your query (the JOINs) just silently ignores any role without type. If such roles are valid, you need to use LEFT JOIN and handle NULL columns.

Second thing: the shared settlementType rows are actually represented by several instances of SettlemenType with equal values. This is usually fine, but it can bite you (or, rather, some poor sod one year in future) in the behind if you start, for example, modify those values.

Zecc

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

GetFieldValue<T> or its async counterpart are preferred over GetString, GetInt32, etc. The latter are only there for backwards compatibility, and they aren’t implementing individual methods for any new supported datatypes.

Mind telling me where you've got this from?

I know for a fact we are using this and I'd like to know why we should stop.

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

I prefer to use GetOrdinal instead of hardcoding column numbers.

Absolutely. This is the first thing that jumped out at me.

Zecc

Alternatively, there's this approach:

using var adapter = new SqlDataAdapter(command);
DataTable dataTable = new DataTable();
adapter.Fill(dataTable);
// then eg:
string description = (string) dataTable["Description"];

which may be overkill, and I'm not sure how it works with JOINs, so Cunningham's applies to this post.

PleegWat

I'd prefer running separate queries for roles, types, and mappings instead of a single joined one. But I don't have a good reasoning for why it's better.

Tsaukpaetra

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

ORMs are not in scope here.

Ah. Then otherwise it seems fine. For now.

dkf

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

ORMs are not in scope here

Shame. It's nice up there.

Great Orme - Wikipedia

boomzilla

I use a slightly different approach. If you have 1:1 (or many:1) relationships, you can flatten the query to include those. Additionally, if you have a 1:many you can include that one, too. Then sort by the ids and loop through all the stuff, noting that when the ids change you need to create another object and [optionally, depending on how it all works) the 1:1 / many:1 children, otherwise you create a new 1:many object and add that.

Pseudocode:

results = my.query
lastId = -1
currentObject = null
listOfObjects = []
for result of results{
    if lastId != result.id {
        currentObject = new object
        ...
        listOfObjects.add( currentObject )
        lastId = currentObject.id
    }
    currentObject.many:many.add( new child )
}

DogsB

@PleegWat said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

I'd prefer running separate queries for roles, types, and mappings instead of a single joined one. But I don't have a good reasoning for why it's better.

Easier to write and maintain? Typically apps I've maintained have been broken up like that to aid reuse and dividing up domain-specific logic. I would typically write a query like the one above in those codebases if performance became a problem or I could away with it.

@Benjamin-Hall That is what I would expect from raw jdbc in java but ymmv with c#

And the entities themselves (right now) are simple data objects (they'll have behavior, but that's out of scope here).

No!

I also spotted the problem @Kamil-Podlesak mentioned about left joins. That may need investigation.

dkf

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

SettlementType

How many settlement types do you have? If that's a small fixed set, it should become an enumeration.

Benjamin Hall

Responses:

@dkf Currently about 10. But it's not fixed--one of the whole points of the project is that these things (all 3 of them) are editable (add, delete, change). And the SettlementType is actually more complex, having a name, description, and bounds for the population. They're things like 'village', 'town', 'city', 'estate', 'thorpe', etc. used for generating settlements for a city. the SettlementRoles are what the settlement does (ie subsistence farming, extraction, trade, etc) and thus what sorts of things can be found there.

@DogsB @Kamil-Podlesak There is guaranteed to be at least one SettlementRole for every SettlementType with a corresponding weight. Because otherwise the program wouldn't be able to "randomly" pick a Role when generating a settlement of the specified Type.

And I'm not sure about your "second thing" at all? Are those database rows? Result object rows? If the first, I'm fairly sure not. Each SettlementType has a single row with a single id. There may be many rows in the mapping table, but the combination of typeId, roleId is unique. As for the result object from the joined queries--that's why I'm using the map/dictionary to only generate that part once.

@Unperverted-Vixen Thanks. I was actually cursing the fact that I couldn't find a way to refer to column names and had to use straight up column numbers. Because that's super ultra fragile and easy for my brain to screw up. Especially on the more complicated ones (there's one particular one that has 6 JOIN'd tables (2x mapping/many-to-many table pairs and 2x 1:many relationships) and 20-ish columns. The SQL driver/library I'm used to in Typescript just automagically puts it into an anonymous object where the property names are the column names. But that's because Typescript has a very loose type system.

Unperverted Vixen

@Zecc said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

I know for a fact we are using this and I'd like to know why we should stop.

When adding streaming support they never implemented async versions of them, just GetFieldValueAsync<t>. They also decided not to implement specific methods for the new DateOnly/TimeOnly structs. GetFieldValue[Async]<T> is clearly the intended method going forward.

Benjamin Hall

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

@Zecc said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

I know for a fact we are using this and I'd like to know why we should stop.

When adding streaming support they never implemented async versions of them, just GetFieldValueAsync<t>. They also decided not to implement specific methods for the new DateOnly/TimeOnly structs. GetFieldValue[Async]<T> is clearly the intended method going forward.

If you're using ExecuteReaderAsync() to get the SqliteDataReader result set, when should you also use the async versions of the GetFieldValue<T> methods?

Also, using the Microsoft.Data.Sqlite package...is there really no better way to get the last inserted row id than doing a whole separate SELECT last_inserted_rowid(); query? The (older) System.Data.Sqlite package (which isn't supported on .NET Core from what I can tell) has a handy-dandy method on the connection to grab that. But the newer version? Nope. Microsoft!

Unperverted Vixen

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

If you're using ExecuteReaderAsync() to get the SqliteDataReader result set, when should you also use the async versions of the GetFieldValue<T> methods?

If you are using SQL Server and have large binary columns you can enable streaming support, to let the application start processing a row before all the columns are fully received.

If you are using SQL Server and turn that on, you need to use the async methods to access data. Otherwise, if the data for a column hasn't streamed in yet, it'll throw instead of blocking.

If you leave streaming off (or are using a different database engine that doesn't support it at all), the async version shouldn't have any practical benefit. In the SQL Server case it also has a happy path, so there's probably no harm to it either - but your specific provider may vary.

Also, using the Microsoft.Data.Sqlite package...is there really no better way to get the last inserted row id than doing a whole separate SELECT last_inserted_rowid(); query? The (older) System.Data.Sqlite package (which isn't supported on .NET Core from what I can tell) has a handy-dandy method on the connection to grab that. But the newer version? Nope. Microsoft!

System.Data.Sqlite doesn't appear to be a Microsoft-provided package, but a third-party one crowding in on their namespace.

Could you write an extension method on the connection if you want convenience access to that query?

Also, System.Data.Sqlite includes both .NET Standard 2.0 and 2.1 versions, so it should work on .NET Core/6 just fine.

Benjamin Hall

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

Could you write an extension method on the connection if you want convenience access to that query?

That's what I ended up doing, but it just feels wrong. It feels like there should be a way to tell the driver "ok, I'm doing an insert. Give me the last rowid back (either in an out parameter or a return value or something)." It's just such a common(?) use case.

Unperverted Vixen

@Benjamin-Hall I agree that it's common, but I can't come up with an API design for it that I'd be happy with. out parameters are ugly. You'd also need to deal with multiple possible return types - what if it's a long or a GUID? (Or some other new type we didn't foresee?)

Easier to just kick the can to devs who know what they need. They can then also build it into the query and use ExecuteScalar to avoid a second round trip.

Benjamin Hall

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

@Benjamin-Hall I agree that it's common, but I can't come up with an API design for it that I'd be happy with. out parameters are ugly. You'd also need to deal with multiple possible return types - what if it's a long or a GUID? (Or some other new type we didn't foresee?)

Easier to just kick the can to devs who know what they need. They can then also build it into the query and use ExecuteScalar to avoid a second round trip.

I've seen chatter that ExecuteScalar bypasses things like unique constraints (at least for Sqlite), which is...bad.

TheCPUWizard

Some good suggestions here. The three table approach is pretty much the standard.

On a side note: You have SQL (relations) and C# (objects) and there is a connection (mapping) between them... so you ARE using an ORM, just a very specific (non-generalized) custom one. [I have had multiple professional engagements where this semantic aspect was crucial, which is why I bring it up here]

Gribnit

@TheCPUWizard said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

and there is a connection (mapping) between them... so you ARE using an ORM

In the general case the second condition here is under-restrictive - the critical point is that sufficiently similar n-uples are materialized that the in-memory relations resemble the table relations. But in this specific case, yes, the wheel embargo has lead to axled rollers.

Gribnit

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

@Benjamin-Hall I agree that it's common, but I can't come up with an API design for it that I'd be happy with. out parameters are ugly. You'd also need to deal with multiple possible return types - what if it's a long or a GUID? (Or some other new type we didn't foresee?)

Easier to just kick the can to devs who know what they need. They can then also build it into the query and use ExecuteScalar to avoid a second round trip.

I've seen chatter that ExecuteScalar bypasses things like unique constraints (at least for Sqlite), which is...bad.

This is not for the language nor the driver to decide, but, embedded databases can do as they please.

loopback0

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

@Benjamin-Hall I agree that it's common, but I can't come up with an API design for it that I'd be happy with. out parameters are ugly. You'd also need to deal with multiple possible return types - what if it's a long or a GUID? (Or some other new type we didn't foresee?)

Easier to just kick the can to devs who know what they need. They can then also build it into the query and use ExecuteScalar to avoid a second round trip.

I've seen chatter that ExecuteScalar bypasses things like unique constraints (at least for Sqlite), which is...bad.

Sounds like SQLite's fault as it's the database engine that enforces this and on real databases (and MySQL) short of disabling the constraints first you can't just bypass them.

dkf

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

I've seen chatter that ExecuteScalar bypasses things like unique constraints (at least for Sqlite), which is...bad.

There's no way to disable unique constraints on columns in SQLite, assuming that they're actually using genuine SQLite and not something that merely pretends to be it. (Why you'd do that in C# when you've got good access to native API calls, I don't know.) I'd suspect the chatter as being full of BS. Even removing the unique constraint (if directly set on a column instead of using an explicit index) is somewhat tricky.

The only constraints that you can disable are foreign keys. You need to do that in some types of migrations.

@Benjamin-Hall said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

That's what I ended up doing, but it just feels wrong. It feels like there should be a way to tell the driver "ok, I'm doing an insert. Give me the last rowid back (either in an out parameter or a return value or something)." It's just such a common(?) use case.

Can't you use a RETURNING clause in your INSERT? You can get whatever you want back as a row like that (as long as you don't mind what order you get them back, which isn't a big deal if you're inserting a single row…)

Of course, that assumes that your ORM will let you do that, mixing DML and DQL in the same query.

Unperverted Vixen

@dkf said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

Can't you use a RETURNING clause in your INSERT? You can get whatever you want back as a row like that (as long as you don't mind what order you get them back, which isn't a big deal if you're inserting a single row…)

TIL. (Though SQL Server uses OUTPUT as the keyword.)

I assume there’s zero perf difference between that and SELECT SCOPE_IDENTITY(), so I’m not going to go rewrite our ORM just to use this new toy; but I’m sure there’s other cases where that will be useful…

izzion

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

I assume

Unperverted Vixen

@izzion I don’t have any good way to benchmark it, and since I’m including the SELECT in the initial query there’s no extra round trip. Even if it does somehow save 1ms per insert, it’s not going to be noticeable.

dkf

@Unperverted-Vixen said in Cunningham's Law: This is the best way of doing this (sql->C# nested object reconstitution).:

TIL. (Though SQL Server uses OUTPUT as the keyword.)

I assume there’s zero perf difference between that and SELECT SCOPE_IDENTITY(), so I’m not going to go rewrite our ORM just to use this new toy; but I’m sure there’s other cases where that will be useful…

That sort of thing is far more useful when you need to get more than just the identity out, such as also wanting back a calculated column that was generated in the insert. When you're just getting the ID back, most database drivers basically do that for you (as long as you're inserting a single row; when there's many rows going in at once, it pays to be more explicit).

You'd expect the costs to be similar in all cases as long as the amount of data being moved is similar.