Joinphobia

boomzilla

I've taken over the code that integrates with a COTS product (blech). The (no longer on my team) guy who used to do this was mostly competent but his code is often very smelly.

One thing to note is that our application uses Oracle but the COTS uses SQL Server, which makes cross application communication somewhat problematic, plus SQLServer's not something we all have a lot of experience with, so things are weird.

In particular, the way you have different "databases" and then schemas within them. Oracle has similar concepts, but...different. The way their DB is set up, they have a variety of DBs for different things.

There's one that's somewhat dynamic in that you can have multiple of them so you can segregate some of your data. In any case, this is about one of the static DBs. First

It was decided that we need to use a configurable value to set these things that will never change. So we end up with garbage like this:

public String getFooDBName(){
    return configuredValue + ".dbo.";
}

...

em.createQuery("select * from " + getFooDBName() + "MyTable")

Which, of course, makes it super easy to read with the benefit that it's super easy to change something that will never change!

Alright, that's dumb enough, but the real kicker is this antipattern (I've removed the configuration shit for everyone's sanity):

insert into MyTable (...)
values(
  (select blah from ThatTable where thisId = 
  (select thisId from ThisTable where someOtherId = 
  (select someOtherId from SomeOtherTable where blah = :whatever))),
  false, true, 1, ...
);

Nested fucking queries!!!!! Using VALUES instead of just using the damn query!

People need to be punched for doing shit like this.

Zecc

@boomzilla said in Joinphobia:

SQLServer's not something we all have a lot of experience with, so things are weird.
In particular, the way you have different "databases" and then schemas within them.

: What the hell does Oracle have then?

@boomzilla said in Joinphobia:

It was decided that we need to use a configurable value to set these things that will never change. So we end up with garbage like this:
public String getFooDBName(){
    return configuredValue + ".dbo.";
}

...

em.createQuery("select * from " + getFooDBName() + "MyTable")

That configuredValue belongs ~~in a museum~~ in the Initial Catalog argument of the connection string. Unless you are doing queries across multiple databases, which it sounds like you are. :sucking-on-lemon: It that case it's unavoidable.

I have something similar in my codebase. Consider yourself lucky no one accidentally uses a space or other unexpected character in their database name.

Incidentally, I have NFC how to escape database names with [] or "" in their names, so I hoping really hard I'll never need to find out.

boomzilla

@Zecc said in Joinphobia:

@boomzilla said in Joinphobia:

SQLServer's not something we all have a lot of experience with, so things are weird.
In particular, the way you have different "databases" and then schemas within them.

: What the hell does Oracle have then?

You can have multiple schemas in a DB. These correlate more to SQLServer databases, I think. Does anyone really use anything other than the dbo schema in SQL Server? The rest seems like DBA management type stuff. That sort of thing in Oracle isn't in some other schema, but some of it is inaccessible via permissions.

@boomzilla said in Joinphobia:
It was decided that we need to use a configurable value to set these things that will never change. So we end up with garbage like this:
public String getFooDBName(){
    return configuredValue + ".dbo.";
}

...

em.createQuery("select * from " + getFooDBName() + "MyTable")
That configuredValue belongs ~~in a museum~~ in the Initial Catalog argument of the connection string. Unless you are doing queries across multiple databases, which it sounds like you are. :sucking-on-lemon: It that case it's unavoidable.

Yeah, we do cross DB queries, and I don't mind that so much. Just not doing it with a "configurable" value that never changes but makes code basically unreadable.

I have something similar in my codebase. Consider yourself lucky no one accidentally uses a space or other unexpected character in their database name.

I'm actually somewhat surprised that the jokers who made this shitpile of COTS didn't manage to do something like that.

robo2

@Zecc said in Joinphobia:

Incidentally, I have NFC how to escape database names with [] or "" in their names, so I hoping really hard I'll never need to find out.

According to https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-identifiers?view=sql-server-ver16#rules-for-regular-identifiers you'll never need to find out (if I read that correctly)

LaoC

I scraped this from a DB cluster's slow query log these days:

select fooentity0_.entity as col_0_0_
from entities fooentity0_ where fooentity0_.parent='0056AA8CC6362800'
or fooentity0_.parent in (
select fooentity1_.entity
from entities fooentity1_
where fooentity1_.parent='0056AA8CC6362800'
or fooentity1_.parent in (
select fooentity2_.entity
from entities fooentity2_
where fooentity2_.parent='0056AA8CC6362800'
or fooentity2_.parent in (
select fooentity3_.entity
from entities fooentity3_
where fooentity3_.parent='0056AA8CC6362800'
or fooentity3_.parent in (
select fooentity4_.entity
from entities fooentity4_
where fooentity4_.parent='0056AA8CC6362800'
or fooentity4_.parent in (
select fooentity5_.entity
from entities fooentity5_
where fooentity5_.parent='0056AA8CC6362800'
)))));

Devs say it's Hibernate's fault

boomzilla

@LaoC said in Joinphobia:

Devs say it's Hibernate's fault

It does have the look of a Hibernate query. I can kind of accept dumb stuff like that from automated code. It's when a human writes similar code that I absolutely draw the line.

dkf

@LaoC It looks like it is trying to do a transitive closure over the parent field, but without actually doing a transitive closure. Yes, it's generated code too, and so butt-ugly, but the problem with it is that what it is doing is really a bunch of self joins and the relationship keys are strings.

LaoC

@dkf said in Joinphobia:

@LaoC It looks like it is trying to do a transitive closure over the parent field, but without actually doing a transitive closure. Yes, it's generated code too, and so butt-ugly, but the problem with it is that what it is doing is really a bunch of self joins and the relationship keys are strings.

Without an index to boot Which is why it ended up in the slow query log in the first place. I suggested using BINGINTs for the entities but it turns out they just happen to look like 64bit hex strings here and can actually be random ASCII shit. Or Latin-1 according to the table, unlike the utf8 in the rest of the DB because optimization!

Also, this is MariaDB that can do recursive queries

Zecc

@boomzilla said in Joinphobia:

You can have multiple schemas in a DB. These correlate more to SQLServer databases, I think. Does anyone really use anything other than the dbo schema in SQL Server?

I think because "no one" ever uses a schema other than dbo (or public in Postgres) people (myself included) will frequently confuse databases with schemas or the other way around.

@robo2 said in Joinphobia:

According to https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-identifiers?view=sql-server-ver16#rules-for-regular-identifiers you'll never need to find out (if I read that correctly)

You missed the part where those rules apply to regular identifiers. But there are also delimited identifiers, which are the ones enclosed between quotes or brackets. As evidenced by someone having created a database with a space in its name, not all identifiers have to comply with the rules for regular identifiers (as bad as an idea that is).

izzion

@boomzilla said in Joinphobia:

Does anyone really use anything other than the dbo schema in SQL Server?

That's like asking does anyone use anything other than the sa user to log into a database server

robo2

@Zecc said in Joinphobia:

You missed the part

I didn't miss it, more like ignored it...
I've had the misfortune to work with sql server databases that had dots and spaces in their name. And with the db set to use case sensitive identifiers combined with inconsistent casing for the same things in different tables

¹If the can be overcome I may even test if ["] are valid identifiers. Don't hold your breath though...

dkf

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

izzion

@dkf said in Joinphobia:

Only in too many damn places "because making an index is expensive!!!".

You should ask those people their opinion on HMO plans

Zecc

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

You get a middle finger Instead.

Bim Zively

@boomzilla said in Joinphobia:

Does anyone really use anything other than the dbo schema in SQL Server?

I was lead on an application that had an absolute shit-ton of features, that easily divided into groups, many tables and even more stored procedures. It became much easier to find objects belonging to a particular feature (e.g. "weather") when I gave each feature its own schema name.

I hate that SQL Server calls them schema though. To me the schema (schemata) of database objects is the struction, like ID INT NOT NULL for a column.

boomzilla

@Bim-Zively yeah, this steaming pile of COTS uses separate databases for that for whatever reason. And, true, it does make finding things a bit easier when it's not completely flat.

TimeBandit

@LaoC said in Joinphobia:

BINGINT

You find your INT in BING?

Gribnit

@TimeBandit said in Joinphobia:

@LaoC said in Joinphobia:

BINGINT

You find your INT in BING?

Bing uses 73-bit integers.

dcon

@Gribnit said in Joinphobia:

@TimeBandit said in Joinphobia:

@LaoC said in Joinphobia:

BINGINT

You find your INT in BING?

Bing uses 73-bit integers.

Or 23-bit. But you'll never know which until it's too late.

Gribnit

@dcon said in Joinphobia:

@Gribnit said in Joinphobia:

@TimeBandit said in Joinphobia:

@LaoC said in Joinphobia:

BINGINT

You find your INT in BING?

Bing uses 73-bit integers.

Or 23-bit. But you'll never know which until it's too late.

I know! I'll use heuristiaaarghIvebeeneatenbyweasels.

LaoC

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

This time it was "because indexes cause problems in the cluster"

Kamil Podlesak

@Zecc said in Joinphobia:

@boomzilla said in Joinphobia:

SQLServer's not something we all have a lot of experience with, so things are weird.
In particular, the way you have different "databases" and then schemas within them.

: What the hell does Oracle have then?

Users.
Really.
It's common to interpret them as schemas, but in reality a schema APP are just tables (views, procedures, etc) owned by the user APP

Someone more knowledgable: Correct me if it's no longer the case, or if I remember it incorrectly. I try to limit my exposure to Oracle as much as possible.

Zecc

@Kamil-Podlesak I can proudly say my exposure to Oracle is limited to TDWTF.

Kamil Podlesak

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

TBH, keeping schemas up-to-date (on INSERT/UPDATE/DELETE) might be quite expensive, so "too many index" is definitely a thing. Of course, it is quite rare (basically it happens when someone just proactively create index for each and every column).

loopback0

@Kamil-Podlesak said in Joinphobia:

@Zecc said in Joinphobia:

@boomzilla said in Joinphobia:

SQLServer's not something we all have a lot of experience with, so things are weird.
In particular, the way you have different "databases" and then schemas within them.

: What the hell does Oracle have then?

Users.
Really.
It's common to interpret them as schemas, but in reality a schema APP are just tables (views, procedures, etc) owned by the user APP

Someone more knowledgable: Correct me if it's no longer the case, or if I remember it incorrectly. I try to limit my exposure to Oracle as much as possible.

Yeah a schema and a user are basically the same thing in Oracle.

Zecc

They must be paranoid, and think the users are always scheming.

BernieTheBernie

@Kamil-Podlesak said in Joinphobia:

I try to limit my exposure to Oracle as much as possible.

#meetwo!

robo2

@robo2 said in Joinphobia:

If the can be overcome I may even test if ["] are valid identifiers.

They are

dkf

@robo2

HardwareGeek

@Zecc said in Joinphobia:

They must be paranoid, and think the users are always scheming.

It's Oracle, so the users are probably scheming about how to avoid having to pay $$$$$ consultants to do things that would be trivial in any other database. Ultimately, however, Oracle will once again defeat their dastardly schemes.

Gribnit

@LaoC said in Joinphobia:

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

This time it was "because indexes cause problems in the cluster"

"What sort of problems? Who, oh who, might have a job description that bears on these vexed, vexed problems?"

Jaloopa

@boomzilla said in Joinphobia:

Does anyone really use anything other than the dbo schema in SQL Server?

We do where I work. In theory it allows for more granular permissions (give the user used by the web application permission for every stored procedure in the web app schema), but in practice lots of stuff ends up being used by multiple business areas so you end up with a mess of permissions, lots of widely used stuff in dbo anyway, code duplication to have copies in different schemas or all of the above.

A previous place I worked, instead of schemas they used prefixes on names (so schemas without the dot). I tried to introduce using a dedicated schema for a piece of work I was responsible for, but it turned out that the COTS we were integrating with completely stopped working if there was any custom schema present at install or update time. That was some fun hurried changes

Mason_Wheeler

@boomzilla said in Joinphobia:

Does anyone really use anything other than the dbo schema in SQL Server?

Yes. For example, this isn't the only way to do it, but a schema can make for a very convenient permissions boundary.

But even if you don't use them for permissions, schemas can be useful just for organizational purposes. Lots of software already treats tables as equivalent to classes, so viewing a schema as a namespace should be natural and intuitive to a C# developer.

boomzilla

@Mason_Wheeler said in Joinphobia:

But even if you don't use them for permissions, schemas can be useful just for organizational purposes.

Duh. The point was that I've never seen it.

Mason_Wheeler

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

If I had a nickel for every time I've heard "we don't want to index that because it will make INSERTs run slower"...

Steve_The_Cynic

@Mason_Wheeler said in Joinphobia:

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

If I had a nickel for every time I've heard "we don't want to index that because it will make INSERTs run slower"...

If the most common (by a factor of ten or more) is an insert, that attitude becomes slightly less stupid.

And before you say it, yes, I've worked with a system that operated on that basis. It was ... interesting.

Arantor

@Steve_The_Cynic Sure, but I think the majority of peopel who whine about indexes slowing it down... tend to not be working on that and are in the category of people who optimise fOr PeRfOrMaNcE.

izzion

@Steve_The_Cynic said in Joinphobia:

@Mason_Wheeler said in Joinphobia:

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

If I had a nickel for every time I've heard "we don't want to index that because it will make INSERTs run slower"...

If the most common (by a factor of ten or more) is an insert, that attitude becomes slightly less stupid.

And before you say it, yes, I've worked with a system that operated on that basis. It was ... interesting.

Never underestimate the value of WMRN datasets

Bulb

@Zecc said in Joinphobia:

That configuredValue belongs ~~in a museum~~ in the Initial Catalog argument of the connection string.

This. A database is the unit of provisioning you get from operations. Because often, at least for the traditional deployments, there is one beefy database server that hosts bunch of independent databases for completely unrelated applications. So the operator installing the application will create a database—with some name of their choosing—and configure the connection string including the initial catalog, and that's what the application is supposed to use.

And while in the Azure case there is much less difference between putting the databases on the same or separate servers (accounting being per-database, not per-server), you'll still often have the databases on the same server either so the operator does not have to come up with another globally unique name (each server gets a DNS record in the same domain) or because elastic pools can't span servers.

Unless you are doing queries across multiple databases, which it sounds like you are. :sucking-on-lemon: It that case it's unavoidable.

You should have a very good reason for that.

@boomzilla said in Joinphobia:

@Bim-Zively yeah, this steaming pile of COTS uses separate databases for that for whatever reason. And, true, it does make finding things a bit easier when it's not completely flat.

This is probably . It makes sense if the databases are to be shared with some other applications (like we use a separate database for power-bi to pull data from), or if there are significantly different performance requirements. Otherwise it's just … complicating everybody's life.

… and note that for both these use-cases you should allow and expect the databases to be on different servers. Which would mean you'd have two separate connection strings … and the database can still be the initial catalog. It would also mean there can't really be cross-database joins. If the data need to be joined, putting them in different databases is … usually wrong.

boomzilla

@Bulb said in Joinphobia:

@boomzilla said in Joinphobia:

@Bim-Zively yeah, this steaming pile of COTS uses separate databases for that for whatever reason. And, true, it does make finding things a bit easier when it's not completely flat.

This is probably . It makes sense if the databases are to be shared with some other applications (like we use a separate database for power-bi to pull data from), or if there are significantly different performance requirements. Otherwise it's just … complicating everybody's life.

… and note that for both these use-cases you should allow and expect the databases to be on different servers. Which would mean you'd have two separate connection strings … and the database can still be the initial catalog. It would also mean there can't really be cross-database joins. If the data need to be joined, putting them in different databases is … usually wrong.

I suspect that it makes more sense for their cloud hosting. We host our own, so it's silly, but some of them would be shared for multiple "tenants" as they call them. Maybe.

Bulb

@boomzilla Well, could be, but having a multi-tenant database and then also creating additional per-tenant databases sounds like just extra work, especially if they are not on different server.

Maybe they want to allocate different service tier to the per-tenant database depending on how much data that tenant has and how much they pay for them, but making it schemas in one database scaled to handle all of them would still be cheaper. Unless they want to guarantee performance, which would be unexpectedly advanced.

Zecc

@dkf said in Joinphobia:

@robo2

boomzilla

@Bulb said in Joinphobia:

@boomzilla Well, could be, but having a multi-tenant database and then also creating additional per-tenant databases sounds like just extra work, especially if they are not on different server.

Maybe they want to allocate different service tier to the per-tenant database depending on how much data that tenant has and how much they pay for them, but making it schemas in one database scaled to handle all of them would still be chea per. Unless they want to guarantee performance, which would be unexpectedly advanced.

All things being equal I think I'll just keep assuming that they're fucking retards. There is really no contradictory evidence.

Gribnit

@boomzilla said in Joinphobia:

All things being equal I think I'll just keep assuming that they're fucking retards. There is really no contradictory evidence.

This almost always holds, sadly.

Tsaukpaetra

@boomzilla said in Joinphobia:

they're fucking retards.

I understand it's exceptionally difficult to obtain consent, therein lies danger!

LaoC

@izzion said in Joinphobia:

@Steve_The_Cynic said in Joinphobia:

@Mason_Wheeler said in Joinphobia:

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

If I had a nickel for every time I've heard "we don't want to index that because it will make INSERTs run slower"...

If the most common (by a factor of ten or more) is an insert, that attitude becomes slightly less stupid.

And before you say it, yes, I've worked with a system that operated on that basis. It was ... interesting.

Never underestimate the value of WMRN datasets

TBF, most audit logs are just that. They are there to be kept for n years and most of the time, nobody will ever look at them again, but if anyone does, they're not gonna run individual queries but they'll want the whole dataset. I manage one of those for a client that doesn't even have a legal obligation to keep it. It's about about a terabyte of log files "just in case we might need them at some point". In the last 8+ years, nobody has ever needed anything from that dump.

Bulb

@boomzilla said in Joinphobia:

All things being equal I think I'll just keep assuming that they're fucking retards. There is really no contradictory evidence.

It's the most plausible explanation indeed.

dkf

@boomzilla said in Joinphobia:

All things being equal I think I'll just keep assuming that they're fucking retards. There is really no contradictory evidence.

This could boomzilla's second law of computing. (The first involves of course.)

kazitor

@dkf said in Joinphobia:

The first involves of course.

“Where there’s a will, there’s a ”

Steve_The_Cynic

@izzion said in Joinphobia:

@Steve_The_Cynic said in Joinphobia:

@Mason_Wheeler said in Joinphobia:

@dkf said in Joinphobia:

@LaoC said in Joinphobia:

Without an index to boot

Ooooh, where have I seen that before? Only in too many damn places "because making an index is expensive!!!".

*sigh*

If I had a nickel for every time I've heard "we don't want to index that because it will make INSERTs run slower"...

If the most common (by a factor of ten or more) is an insert, that attitude becomes slightly less stupid.

And before you say it, yes, I've worked with a system that operated on that basis. It was ... interesting.

Never underestimate the value of WMRN datasets

The thing I was specifically thinking of was a financial market activity data set that took high-frequency inserts and (relatively) low-frequency reads. Write performance was key...