SQL Server weirdness

dangeRuss

I have a TVF in SQL server that seems to be acting really funny.

If I call it like this

declare @id uniqueidentifier = '5C7AF40F-1A1B-4EBA-9C2F-F42E4CAC4DC8';
select * from myTVF(@id)

it takes about 2 minutes

if I call it like this:

select * from myTVF('5C7AF40F-1A1B-4EBA-9C2F-F42E4CAC4DC8')

it's immediate.

Any idea WTF is going on?

skotl

My guess would be that the key in either the function or the underlying table isn't a UUID?

What happens if you try:

declare @id nvarchar(64)= '5C7AF40F-1A1B-4EBA-9C2F-F42E4CAC4DC8';
select * from myTVF(@id)

izzion

@dangeRuss
Are the actual execution plans the same for both approaches? I’d expect you’re running into a parameter sniffing problem - with the variable into the TVF the plan is compiling against the “density function” for the key (optimize for unknown, effectively) and thus doing a table scan, while with the string value the TVF realizes it has an index that it can seek into and get exactly the data it needs.

dangeRuss

@skotl takes about 2 minutes

skotl

@dangeRuss said in SQL Server weirdness:

@skotl takes about 2 minutes

Darn :(

@izzion said in SQL Server weirdness:

@dangeRuss
Are the actual execution plans the same for both approaches? I’d expect you’re running into a parameter sniffing problem - with the variable into the TVF the plan is compiling against the “density function” for the key (optimize for unknown, effectively) and thus doing a table scan, while with the string value the TVF realizes it has an index that it can seek into and get exactly the data it needs.

In which case, why does my suggestion (forcing it to be a string) also take two minutes?

izzion

@skotl
Because the TVF’s parameter and the index key in question is actually UNIQUEIDENTIFIER, so the implicit conversion causes the same density function tipover.

Edit: Er, I’m a dumb. The tipover is happening because of the variable, which causes SQL to build a plan that can handle any possible value the variable gets. Whereas when you provide a direct UID into the TVF call, it optimizes specifically for that value. Which probably means implicit parameterization isn’t active for this database (or at least is kneecapped for the TVF).

dangeRuss

@izzion said in SQL Server weirdness:

@skotl
Because the TVF’s parameter and the index key in question is actually UNIQUEIDENTIFIER, so the implicit conversion causes the same density function tipover.

Edit: Er, I’m a dumb. The tipover is happening because of the variable, which causes SQL to build a plan that can handle any possible value the variable gets. Whereas when you provide a direct UID into the TVF call, it optimizes specifically for that value. Which probably means implicit parameterization isn’t active for this database (or at least is kneecapped for the TVF).

Disabling the parameter sniffing didn't help, but adding option recompile helped.

Is there a way to fix this properly, or is option recompile as good as it's going to get?

izzion

@dangeRuss
That gets into performance tuning for the TVF itself and seeing the two different plans it’s generating. If you can sufficiently anonize the TVF to post it I can at least look to see if there’s an obvious stinker, but beyond that you’re gonna need to do full performance tuning.

boomzilla

@izzion said in SQL Server weirdness:

Because the TVF’s parameter and the index key in question is actually UNIQUEIDENTIFIER, so the implicit conversion causes the same density function tipover.

Is UNIQUEIDENTIFIER an actual data type?

izzion

@boomzilla
Yes, it’s MSSQL’s guid data type. And the PK field of choice for ~~Hindustan~~ developers who like watching SQL cry.

dangeRuss

@izzion said in SQL Server weirdness:

@boomzilla
Yes, it’s MSSQL’s guid data type. And the PK field of choice for ~~Hindustan~~ developers who like watching SQL cry.

It's better than varchar I assume. And probably even better if NEWSEQUENTIALID is used instead of NEWID

izzion

@dangeRuss
Under most cases, GUID is worse than VARCHAR for a PK field (especially if NEWID is used), since it's none of the properties that a good key field should have (Sequential, Monotonic, Narrow) and also isn't a meaningful business value. If you're using an auto-generated GUID as a PK / FK field, you probably should be using an IDENTITY field (INT, or BIGINT if number of rows is required) instead

dangeRuss

@izzion INT is nice, but can't be exposed to clients as it's too easily guessable.

dangeRuss

@izzion said in SQL Server weirdness:

@dangeRuss
That gets into performance tuning for the TVF itself and seeing the two different plans it’s generating. If you can sufficiently anonize the TVF to post it I can at least look to see if there’s an obvious stinker, but beyond that you’re gonna need to do full performance tuning.

I think I found the issue. I saw elsewhere that this type of stuff can be a problem. These functions are meant to be used with a NULL parameter (run for all rows) or a non null parameter (run for a single row).

So we have something like this

coalesce( @id, t.id)  = t.id

Groaner

@izzion said in SQL Server weirdness:

@dangeRuss
Under most cases, GUID is worse than VARCHAR for a PK field (especially if NEWID is used), since it's none of the properties that a good key field should have (Sequential, Monotonic, Narrow) and also isn't a meaningful business value. If you're using an auto-generated GUID as a PK / FK field, you probably should be using an IDENTITY field (INT, or BIGINT if number of rows is required) instead

GUIDs can make replication a lot easier as then you don't have to remap IDs on insertion/update between multiple remote systems. But then again, you don't need to use the GUID as a clustering key. And while 16 bytes is wider than 8 or 4, it's also a lot narrower than some of the 700+ byte clustering keys that I've seen in the wild. Some people like their natural keys for some reason.

Otherwise I'd agree. Also, Microsoft broke IDENTITY back in SQL Server 2012 iirc in that a server restart adds 1000 to IDENTITY values such that the life expectancy of an int column is drastically reduced. I've been meaning to learn how to use sequences, which are Microsoft's recommended replacement for this issue, but have been too lazy to do so so far. They're also a lot more work than just typing thirteen-ish characters for a column definition.

Unperverted Vixen

@Groaner said in SQL Server weirdness:

Also, Microsoft broke IDENTITY back in SQL Server 2012 iirc in that a server restart adds 1000 to IDENTITY values such that the life expectancy of an int column is drastically reduced.

That's news to me. I haven't seen that on our SQL Server 2016 servers at all...

izzion

@dangeRuss
Yeah, an optional/nullable parameter like that is going to cause that sort of parameter sniffing problem.

The two right ways to fix it are:

Split the TVF into two, one that always “takes a null” and one that requires the parameter to have an appropriate value.
Inline the TVF into the queries that call it and make those queries stored procedures that do dynamic SQL to add the where filter if and only if required.

That way each of the two cases can have a reusable query plan that is appropriate for their data case (1 row vs whole table). In my experience, neither change is ever practical due to the blast radius of such a code change, so you’re probably stuck with the OPTION RECOMPILE hint.

Groaner

@Unperverted-Vixen said in SQL Server weirdness:

@Groaner said in SQL Server weirdness:

Also, Microsoft broke IDENTITY back in SQL Server 2012 iirc in that a server restart adds 1000 to IDENTITY values such that the life expectancy of an int column is drastically reduced.

That's news to me. I haven't seen that on our SQL Server 2016 servers at all...

It looks like there's now a database option to get around it.

Pinal Dave / Jan 24, 2018

SQL SERVER - Identity Jumping 1000 - IDENTITY_CACHE - SQL Authority with Pinal Dave

SQL Server 2017 has a wonderful feature which fixes a very old issue which DBA and developers have been struggling for a while. It would be impossible to find DBA or Developer who has not faced an issue when their identity column value has jumped or incremented by 1000. It is a very common issue...

But it has been an issue for a while.

Identity column value suddenly jumps to 1001 in sql server

I am using Sql server 2012(Denali). I wonder why all identity column values start from 1001 and so on. At the beginning Identity column starts from 1,2 and so on and adding identity smoothly, but

izzion

@Groaner said in SQL Server weirdness:

Also, Microsoft broke IDENTITY back in SQL Server 2012 iirc in that a server restart adds 1000 to IDENTITY values such that the life expectancy of an int column is drastically reduced.

That’s only totally true in a two-way replication or Availability Group case. For most basic stand-alone database cases, a controlled shutdown won’t result in IDENTITY gaps. Uncontrolled shutdowns probably will, even with a non-replicated database, though I agree they shouldn’t — if the database isn’t replicated, there’s no partner that could/should have used key values so why are we “reserving” the chunk in case of a dirty shutdown.

Unperverted Vixen

@izzion said in SQL Server weirdness:

That’s only totally true in a two-way replication or Availability Group case. For most basic stand-alone database cases, a controlled shutdown won’t result in IDENTITY gaps.

Okay, that makes sense. Looking at one of our databases, there is one gap of 1000 in our main logging table that correlates with an unexpected SQL server restart. The other gaps are all much smaller, and are likely due to failed transactions that were rolled back (i.e. our monthly reboots aren't consuming these).

Although even if they were, consuming 1000 IDENTITY values on a monthly basis for 100 years would still only waste 0.05% of the available values in an INT. So I don't know if I'd call the lifespan "drastically reduced". Just... don't use SMALLINT. :)

Uncontrolled shutdowns probably will, even with a non-replicated database, though I agree they shouldn’t — if the database isn’t replicated, there’s no partner that could/should have used key values so why are we “reserving” the chunk in case of a dirty shutdown.

The name of the way to control it in SQL Server 2017 may be telling - it's the IDENTITY_CACHE configuration option. They're probably doing it in other scenarios for performance reasons.

bobjanova

@dangeRuss said in SQL Server weirdness:

@izzion INT is nice, but can't be exposed to clients as it's too easily guessable.

You shouldn't be using non-guessability of object IDs for access control anyway. If I'm at site/page?id=234 and I think "oh, that looks interesting ..." and try site/page?id-233, even if that's a valid ID I should get a 404 or 403 response.

Lots of security people seem to think this is important though so we have some code to obfuscate ID values to send to/from the client.

izzion

@bobjanova said in SQL Server weirdness:

@dangeRuss said in SQL Server weirdness:

@izzion INT is nice, but can't be exposed to clients as it's too easily guessable.

You shouldn't be using non-guessability of object IDs for access control anyway. If I'm at site/page?id=234 and I think "oh, that looks interesting ..." and try site/page?id-233, even if that's a valid ID I should get a 404 or 403 response.

Lots of security people seem to think this is important though so we have some code to obfuscate ID values to send to/from the client.

Also, -- I meant to mention this as well but was out for lunch and

All a GUID is is a 128-bit INT with fancy letters in it. Granted, 16 trillion trillion possibilities is going to take a bit longer to crawl than 4 billion. But if you're designing a system that needs its object identifiers to be non-guessable for security, you've designed one of those insecure systems that make people wish software engineering was licensed in a way that you could be made liable for your system.

lolwhat

@izzion said in SQL Server weirdness:

~~Hindustan~~ developers who like watching SQL cry

Oh, I know of at least one company with a code base that has UNIQUEIDENTIFIERs as PKs on some million-plus-row, heavily-read-and-written tables. And the vast majority of the devs who have touched that code base are Americans.

dangeRuss

@izzion There's a difference between using access control and not allowing scripting easily.

For example, you may be allowed to see any product on my site, but I don't want you to be able to easily crawl through my entire catalog.

izzion

@dangeRuss
Personally, I'd rather make discovering the full list as painless as possible, so that people don't eventually feel compelled to bang on my system to get the data they need (or worse, decide to go elsewhere). But as long as you're not relying on GUIDs for security, I don't have any beef with you. I just might be laughing and counting potential future consulting $$$ where I think you can't see me

boomzilla

@izzion said in SQL Server weirdness:

@dangeRuss
Under most cases, GUID is worse than VARCHAR for a PK field (especially if NEWID is used), since it's none of the properties that a good key field should have (Sequential, Monotonic, Narrow) and also isn't a meaningful business value. If you're using an auto-generated GUID as a PK / FK field, you probably should be using an IDENTITY field (INT, or BIGINT if number of rows is required) instead

"Not a meaningful business value" is usually my #1 criterion in picking a PK, actually (natural keys should be considered harmful as a general rule). But I don't SQL Server so stuff like "IDENTITY field" is just more cricket articles to me.

izzion

@boomzilla
IDENTITY - autoincrementing field (generally integer based though I believe that’s not a hard restriction) that is system managed at row creation time and requires you to set a “yes I know I’m being an idiot” flag if you want to manually set or modify it.

Karla

@izzion said in SQL Server weirdness:

@boomzilla
Yes, it’s MSSQL’s guid data type. And the PK field of choice for ~~Hindustan~~ developers who like watching SQL cry.

Don't forget MS Dynamix CRM, so many guids.

Karla

@Groaner said in SQL Server weirdness:

SQL Server 2012 iirc in that a server restart adds 1000 to IDENTITY values

Hmmm, TIL.

izzion

@boomzilla said in SQL Server weirdness:

"Not a meaningful business value" is usually my #1 criterion in picking a PK, actually (natural keys should be considered harmful as a general rule).

Also, here I'd throw out a big that depends based on what your natural key looks like and how your data gets accessed and modified. While most natural keys don't meet the Sequential/Monotonic criteria for a clustering (Primary) key, they are almost always the primary ( natural) criteria for filtering and joining the data set. So you probably are going to wind up adding non-clustered indexes on the natural key(s) anyway... so if they're narrow enough and either you write much less frequently than you read or the writes are in a pattern that's kind of Sequential/Monotonic anyway (e.g. data warehousing style data but not big enough to merit an actual data warehouse), then why spend the overhead of an auto-increment (or shudders GUID) field just to say you're not using a natural key?

boomzilla

@izzion said in SQL Server weirdness:

@boomzilla said in SQL Server weirdness:

"Not a meaningful business value" is usually my #1 criterion in picking a PK, actually (natural keys should be considered harmful as a general rule).

Also, here I'd throw out a big that depends based on what your natural key looks like and how your data gets accessed and modified. While most natural keys don't meet the Sequential/Monotonic criteria for a clustering (Primary) key,

Blah blah blah...that sounds like more SQL Server nonsense.

they are almost always the primary ( natural) criteria for filtering and joining the data set.

Filtering, perhaps. But what do you do when it changes? How do you handle duplicates? Not worth it.

So you probably are going to wind up adding non-clustered indexes on the natural key(s) anyway... so if they're narrow enough and either you write much less frequently than you read or the writes are in a pattern that's kind of Sequential/Monotonic anyway (e.g. data warehousing style data but not big enough to merit an actual data warehouse), then why spend the overhead of an auto-increment (or shudders GUID) field just to say you're not using a natural key?

We use GUIDs as varchars everywhere. They're fine. I can't imagine why I'd care about sequential / monotonic keys.

izzion

@boomzilla said in SQL Server weirdness:

I can't imagine why I'd care about sequential / monotonic keys.

With the caveat that I have no idea how Oracle works (beyond inserting $$$$$$$$$$ to get 1 byte of data ), the reason why SQL Server gurus say to ensure your Primary (clustering) key is monotonic is that SQL stores the data in order of the primary key, so if new rows are getting inserted "in the middle" of your clustering key's order, SQL has to split the 8k pages into two in order to fit the new row in where it belongs, leading to more disk space consumption, I/O, and memory churn when working with that table. Whereas sequential keys just keep adding onto the end of the current page, no gaps in the middle.

@boomzilla said in SQL Server weirdness:

Filtering, perhaps. But what do you do when it changes? How do you handle duplicates? Not worth it.

And that's why not all natural keys are good primary keys.

I think the best examples I have are natural keys in data that isn't really a user table, such as a quasi-data warehouse table that's storing summary data out of the rest of the system. For example, if you have a table of KPI summary data that's organized by Firm ID and Date, that natural key makes a pretty good clustering key; it's sequential, monotonic, not duplicated, and the keys don't change (which I guess I did leave out of my original PK criteria whoops). But yeah, you obviously wouldn't make a PK on your users' given names or home cities (though e-mail/username might be feasible, aside from the considerations of width and lack of monotonicity).

boomzilla

@izzion said in SQL Server weirdness:

I think the best examples I have are natural keys in data that isn't really a user table, such as a quasi-data warehouse table that's storing summary data out of the rest of the system.

Data warehouse type stuff is different.

But yeah, you obviously wouldn't make a PK on your users' given names or home cities (though e-mail/username might be feasible, aside from the considerations of width and lack of monotonicity).

The problem with something like that is changing them when someone gets a new email address. Usernames based on names (common in organizations) change all the time when people get married or divorced. Index them for filtering, by all means. Just don't key off of them because it will only lead to tears.

With the caveat that I have no idea how Oracle works (beyond inserting $$$$$$$$$$ to get 1 byte of data ), the reason why SQL Server gurus say to ensure your Primary (clustering) key is monotonic is that SQL stores the data in order of the primary key, so if new rows are getting inserted "in the middle" of your clustering key's order, SQL has to split the 8k pages into two in order to fit the new row in where it belongs, leading to more disk space consumption, I/O, and memory churn when working with that table. Whereas sequential keys just keep adding onto the end of the current page, no gaps in the middle.

Yeah, I don't know the details and I'm happy about that. My DBA never said not to do this sort of thing, so I'd guess that there's no such implementation detail in Oracle.

Tsaukpaetra

@boomzilla said in SQL Server weirdness:

We use GUIDs as varchars everywhere. They're fine. I can't imagine why I'd care about sequential / monotonic keys.

Dev who made Hypatia's database initially made them VARCHAR(255), despite not ever using that many. Future proofing?

loopback0

@boomzilla said in SQL Server weirdness:

With the caveat that I have no idea how Oracle works (beyond inserting $$$$$$$$$$ to get 1 byte of data ), the reason why SQL Server gurus say to ensure your Primary (clustering) key is monotonic is that SQL stores the data in order of the primary key, so if new rows are getting inserted "in the middle" of your clustering key's order, SQL has to split the 8k pages into two in order to fit the new row in where it belongs, leading to more disk space consumption, I/O, and memory churn when working with that table. Whereas sequential keys just keep adding onto the end of the current page, no gaps in the middle.

Yeah, I don't know the details and I'm happy about that. My DBA never said not to do this sort of thing, so I'd guess that there's no such implementation detail in Oracle.

Yeah I don't think it's a thing in Oracle.

Kamil Podlesak

@loopback0 said in SQL Server weirdness:

@boomzilla said in SQL Server weirdness:

With the caveat that I have no idea how Oracle works (beyond inserting $$$$$$$$$$ to get 1 byte of data ), the reason why SQL Server gurus say to ensure your Primary (clustering) key is monotonic is that SQL stores the data in order of the primary key, so if new rows are getting inserted "in the middle" of your clustering key's order, SQL has to split the 8k pages into two in order to fit the new row in where it belongs, leading to more disk space consumption, I/O, and memory churn when working with that table. Whereas sequential keys just keep adding onto the end of the current page, no gaps in the middle.

Yeah, I don't know the details and I'm happy about that. My DBA never said not to do this sort of thing, so I'd guess that there's no such implementation detail in Oracle.

Yeah I don't think it's a thing in Oracle.

There is a thing called "clustered index" in DB2 and it works in a roughly similar way, but there is absolutely no connection between it and Primary Key (unless you explicitly create them that way). There is, of course, much more to be tuned (page size, PCTFREE etc), but it's not that hard. Especially compared to Oracle world, where the standard approach is "don't try to understand it, just find some WIZARD and prepare big pile of $$$".

Actually, even quick google search tells me that SQL Server also has "clustered index" just like DB2, it just defaults to the PK (but you can do CREATE CLUSTERED INDEX to define different one). So actively discouraging random or natural primary keys just because of clustering is minor Of course, there are other arguments for using sequences, but default setting of low-level data storage?

PleegWat

@boomzilla said in SQL Server weirdness:

Yeah, I don't know the details and I'm happy about that. My DBA never said not to do this sort of thing, so I'd guess that there's no such implementation detail in Oracle.

Yeah, I'm pretty sure oracle doesn't do that. Tables are heap-organised, and while I think there are some differences between primary keys and 'normal' unique indexes, they're not related to the organisation of data on disk.

I think you can also have index-organised tables, but those are the exception rather than the rule.

boomzilla

@Kamil-Podlesak said in SQL Server weirdness:

So actively discouraging random or natural primary keys just because of clustering is minor Of course, there are other arguments for using sequences, but default setting of low-level data storage?

Yeah, it seems like a premature micro-optimization, unless maybe you're really writing a lot of data. But something like that is going to be an event or order table where there really isn't a natural key anyways.

Karla

@boomzilla said in SQL Server weirdness:

Usernames based on names (common in organizations) change all the time when people get married or divorced.

Not where I work.

Zenith

I just wanted to second the option recompile bit. I had this same issue a few years back and that was what made the execution time halfway bearable.