SQL Server Optimization Help

Karla

I've effectively become the SQL Optimization expert.

When I get the detail I improved from 45 seconds to 1 second.

I am now stuck of the summary counts. Some stuff happens before this but I narrowed the problem down to here.

	select id, x, y, z, calcSomeNumber =  (select dbo.calcNumber(id))
	from table1
	where a = 'abc'
		and b = 'xyz'
), tmpCalcInRange as (
	select id, x, y, z, calcSomeNumber
	from tmpOverallSelect
	where calcSomeNumber < 25
)

select count(*) from tmpOverallSelect -- 1 sec -- 2416 records
select count(*) from tmpCalcInRange -- 30 secs -- 52 records

Why is there such a huge difference depending up on which CTE I select from?

Maciejasjmj

@Karla if you're just selecting a count(*) from the overall select, the whole calcNumber thing is unnecessary, so it's probably optimized away. In the other CTE you're filtering based on it, so it does have to be calculated.

What are the query plans? I wonder if SQL Server is smart enough to first filter based on the equality comparisons, and only then apply the calculation.

Karla

@Maciejasjmj said in SQL Server Optimization Help:

@Karla if you're just selecting a count(*) from the overall select, the whole calcNumber thing is unnecessary, so it's probably optimized away. In the other CTE you're filtering based on it, so it does have to be calculated.

What are the query plans? I wonder if SQL Server is smart enough to first filter based on the equality comparisons, and only then apply the calculation.

I'm doing way more than count. That was just to isolate the problem.

I'll see if I can spot the difference in the query plans. The problem is the actual queries are way more complicated than this.

Karla

I don't see any difference in the execution plans.

Karla

I see the difference now but execution plan says cost is 0%.

Polygeekery

@Karla said in SQL Server Optimization Help:

I've effectively become the SQL Optimization expert.

The industry term is "DBA by default". You were the last one standing when the music turned off. May god have mercy on your soul.

Groaner

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Karla

@Polygeekery said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

I've effectively become the SQL Optimization expert.

The industry term is "DBA by default". You were the last one standing when the music turned off. May god have mercy on your soul.

LOL we actually have a DBA. I used to ask him about performance but I have ultimately learned more through google.

Karla

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Groaner

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Can you give me an example of how to do that?

Also, the function returns a decimal if that has any bearing.

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?

; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)

If so, that didn't help.

Groaner

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Can you give me an example of how to do that?

Would be happy to when I get home. Not doing this on mobile. :(

Also, the function returns a decimal if that has any bearing.

Shouldn't, but it gives me some idea that there are some calculations going on in the background, which may or may not be expensive. But I've found many a bottleneck that was caused by a scalar function against a large result set. It's also possible that its evaluation is getting delayed to the second CTE, which is would explain why the first CTE is fine by itself.

Groaner

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?
; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)
If so, that didn't help.

Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Can you give me an example of how to do that?

Would be happy to when I get home. Not doing this on mobile. :(

Understood.

Also, the function returns a decimal if that has any bearing.

Shouldn't, but it gives me some idea that there are some calculations going on in the background, which may or may not be expensive. But I've found many a bottleneck that was caused by a scalar function against a large result set. It's also possible that its evaluation is getting delayed to the second CTE, which is would explain why the first CTE is fine by itself.

OK that at least makes sense then.

In the past I've had success in moving scalar functions out of the where clause into the select which is why this one seems so weird.

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?
; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)
If so, that didn't help.
Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.

I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.

Groaner

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?
; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)
If so, that didn't help.
Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.
I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.

That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

izzion

@Karla
The biggest reason that scalar functions are often a boogyman is that the optimizer can't mash them down into a set based operation, so their cost goes up linearly (or worse, if you wind up memory constrained and have to spill to TempDB) as you have more rows being effected. And, to compound the problem, the optimizer can't estimate effectively for scalar functions either, so it's very likely that you'll wind up memory constrained in large operations, because the optimizer's guess will be too low. (And in small operations, you'll get too much memory and potentially impact other operations on the server that have to wait for memory or work with less).

How grungy is the calculation? Can it just be inlined within the SELECT statement directly?

Karla

@izzion said in SQL Server Optimization Help:

@Karla
The biggest reason that scalar functions are often a boogyman is that the optimizer can't mash them down into a set based operation, so their cost goes up linearly (or worse, if you wind up memory constrained and have to spill to TempDB) as you have more rows being effected. And, to compound the problem, the optimizer can't estimate effectively for scalar functions either, so it's very likely that you'll wind up memory constrained in large operations, because the optimizer's guess will be too low. (And in small operations, you'll get too much memory and potentially impact other operations on the server that have to wait for memory or work with less).

How grungy is the calculation? Can it just be inlined within the SELECT statement directly?

It is grungy...it is calculating a percentage of attendance based on several different dates. I think I am getting to the point I have to let it go for tonight. Right now, I know my logic is wrong in calculating it as a table value function. It is much easier to do it per record.

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?
; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)
If so, that didn't help.
Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.
I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.
That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

Thanks, I will keep that in mind.

izzion

@Karla
Is the realistic data set small enough (for certain values of the world small) that it would be more reasonable to do the calculation and filter in the application tier?

Or can it be distilled into something that can be a computed value on the underlying table? How often does the result of the calculation change, would it even be cacheable in the database tier?

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?
; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)
If so, that didn't help.
Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.
I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.
That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.

I just looked that up and based on the complicated calcs, I am not sure inline is possible.

Karla

Thank you everyone!!

At least now I understand if I don't call the last CTE in the chain it doesn't run. While I guess that should be obvious (based on the definition of CTEs) it wasn't intuitive to me.

Groaner

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?
; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)
If so, that didn't help.
Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.
I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.
That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.
I just looked that up and based on the complicated calcs, I am not sure inline is possible.

Everything is possible when you let the mystic power of the Dragonflame burn in your heart.

The result may be a little ugly, though.

Karla

@izzion said in SQL Server Optimization Help:

@Karla
Is the realistic data set small enough (for certain values of the world small) that it would be more reasonable to do the calculation and filter in the application tier?

If we are only talking about the final where clause then yes.

I was hoping not to touch the front-end

Or can it be distilled into something that can be a computed value on the underlying table?

No, both the numerator and denominator are calculated by joining multiple tables and the current date is used.

How often does the result of the calculation change,

Periodically throughout the day.

would it even be cacheable in the database tier?

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:
@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:

How much I/O are those operations pulling? Which tables are the big ones?

I find aggregation on complex CTE's can be slower for larger tables than using an intermediate temp table in some cases.

Base table have a couple hundred thousand.

First, CTE in the chain brings that down to a few thousand.

Just noticed something else. You have a scalar function in your first select. Those aren't known for being fast to begin with. Are you able to convert it to an APPLY clause or a join easily, or hoist it to your outer CTE?

Is this what you mean by APPLY?
; with tmpOverallSelect as (
	select id, x, y, z, calcSomeNumber
	from table1 a
		cross apply ( select calcSomeNumber = dbo.calcNumber(a.id) ) tmp
	where a = 'abc'
		and b = 'xyz'
)
If so, that didn't help.
Sort of. I meant expanding the guts of that function into the apply. Since the calls remain, we don't see a change.
I started looking at the function and it was doing some stupid joins. Removing them reduced to 8 seconds.

But I had already started turning the function into a table value function that I will join and I think that will be even faster.
That's usually the best way to attack a poorly-performing scalar function.

One thing I will caution about is that there are multiple types of table functions and you'll want to prefer inline functions (returns table as return select...) as they are basically parameterized views and the most performant.
I just looked that up and based on the complicated calcs, I am not sure inline is possible.
Everything is possible when you let the mystic power of the Dragonflame burn in your heart.

The result may be a little ugly, though.

OK, I see that I can use CTEs inside an inline table variable function.

How to use a CTE statement in a table-valued function in SQL Server

I have come to understand that some versions of Microsoft OLE DB Provider for SQL Server (mostly on Windows XP) do not support WITH statement. So, I decided to move my SQL statement into a table-va...

I will try that tomorrow.

Karla

@Groaner If you can use CTEs in inline TVF, what is the benefit to multi-statement TVF in most cases?

Groaner

@Karla said in SQL Server Optimization Help:

@Groaner If you can use CTEs in inline TVF, what is the benefit to multi-statement TVF in most cases?

Mostly for imperative programmers who still think in terms of "I need to roll up by X, then join Y, then take top 50 Z" and haven't yet realized that they can do all that in a single statement.

Granted, there are probably a few cases where having intermediate variables/tables would be helpful, but I can't think of any examples off the top of my head.

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

@Groaner If you can use CTEs in inline TVF, what is the benefit to multi-statement TVF in most cases?

Mostly for imperative programmers who still think in terms of "I need to roll up by X, then join Y, then take top 50 Z" and haven't yet realized that they can do all that in a single statement.

Granted, there are probably a few cases where having intermediate variables/tables would be helpful, but I can't think of any examples off the top of my head.

Thanks!

I will keep that in mind.

Basically when I was writing the TVFs I was just modelling off of and existing one and not realizing there was a different way.

I will update tomorrow.

Groaner

Okay, making good on my promise:

create table orders
(
  order_id int identity(1,1),
  order_address varchar(1000),
  order_date datetimeoffset,
  constraint pk_orders primary key (order_id)
)

create table items
(
  item_id int identity(1,1),
  name varchar(50),
  price money,
  constraint pk_items primary key (item_id)
)

create table order_items
(
  order_item_id int identity(1,1),
  order_id int not null,
  item_id int not null,
  price money not null,
  constraint pk_order_items primary key (order_item_id),
  constraint uq_order_items unique (order_id, item_id),
  constraint fk_order_items_orders foreign key (order_id) references orders,
  constraint fk_order_items_items foreign key (item_id) references items
)

create table order_discounts
(
  order_discount_id int identity(1,1),
  order_id int not null,
  percent_discount float,
  straight_discount money,
  constraint pk_order_discounts primary key (order_discount_id),
  constraint fk_order_discounts_orders foreign key (order_id) references orders
)
go

create view numbers
as
with cte0 as (select 1 as n union all select 1), -- 2
  cte1 as (select a.n from cte0 a, cte0 b), -- 4
  cte2 as (select a.n from cte1 a, cte1 b), -- 16
  cte3 as (select a.n from cte2 a, cte2 b), -- 256
  cte4 as (select a.n from cte3 a, cte3 b), -- 65536
  cte5 as (select a.n from cte4 a, cte4 b) -- 4294967296
select n from cte5
go

insert into orders
select top 10000 newid(), sysdatetimeoffset()
from numbers

insert into items
select top 200 newid(), ABS(CHECKSUM(NewId())) % 100	-- pseudorandom data
from numbers

insert into order_items
select order_id, item_id, price
from orders
cross join items

insert into order_discounts
select order_id, 0.01 * (ABS(CHECKSUM(NewId())) % 50), ABS(CHECKSUM(NewId())) % 10	-- pseudorandom data
from orders
go

-- I've seen too many scalar functions that look like this.
create function total_order_price
(
	@order_id int
)
returns money
as
begin
	declare @price money

	select @price = sum(price)
	from order_items
	where order_id = @order_id	

	-- Only allow a single discount for now.  This makes the example simpler, but if you really wanted to have cascading discounts, you'd need a recursive CTE to accumulate them, and I don't feel like writing one of those right now given that I have to pack for a trip tonight.
	select top 1 @price = (@price * (1.0 - percent_discount)) - straight_discount
	from order_discounts
	where order_id = @order_id
	order by order_discount_id		-- Added for determinism and to ward off undefined behavior

	return @price
end
go

set statistics io on
set statistics time on

-- Naïve query
select *, dbo.total_order_price(order_id) as total_price
from orders

-- Better query
select o.*, base_order_prices.price * (1.0 - coalesce(od.percent_discount, 0)) - coalesce(od.straight_discount,0) as total_price
from orders o
left outer join order_discounts od on
(od.order_id = o.order_id)
outer apply
(
	select sum(price) as price
	from order_items oi
	where oi.order_id = o.order_id
) as base_order_prices


set statistics io off
set statistics time off

So here. we have a sample data set of 10000 orders and 2 million total order items. Running both queries on my machine yields the following stats:


(10000 row(s) affected)
Table 'orders'. Scan count 1, logical reads 81, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 21719 ms,  elapsed time = 21733 ms.
SQL Server parse and compile time: 
   CPU time = 5 ms, elapsed time = 5 ms.

(10000 row(s) affected)
Table 'Worktable'. Scan count 10000, logical reads 6025711, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_items'. Scan count 1, logical reads 7197, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'orders'. Scan count 1, logical reads 81, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'order_discounts'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 4172 ms,  elapsed time = 4379 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

I find it odd that the function calls don't show any table hits, but as you can see, the time results are dramatically different. I also find the "physical reads 0" to be interesting, but I attribute that to this rig having 32GB of RAM.

And yes, I ran these multiple times to rule out caching.

Edit: That database took up a whole 900MB (on a drive that has 2.41 TB free). Good thing it can be expressed more compactly, like in script above.

Karla

@Groaner said in SQL Server Optimization Help:

Okay, making good on my promise:

create table orders
(
  order_id int identity(1,1),
  order_address varchar(1000),
  order_date datetimeoffset,
  constraint pk_orders primary key (order_id)
)

create table items
(
  item_id int identity(1,1),
  name varchar(50),
  price money,
  constraint pk_items primary key (item_id)
)

create table order_items
(
  order_item_id int identity(1,1),
  order_id int not null,
  item_id int not null,
  price money not null,
  constraint pk_order_items primary key (order_item_id),
  constraint uq_order_items unique (order_id, item_id),
  constraint fk_order_items_orders foreign key (order_id) references orders,
  constraint fk_order_items_items foreign key (item_id) references items
)

create table order_discounts
(
  order_discount_id int identity(1,1),
  order_id int not null,
  percent_discount float,
  straight_discount money,
  constraint pk_order_discounts primary key (order_discount_id),
  constraint fk_order_discounts_orders foreign key (order_id) references orders
)
go

create view numbers
as
with cte0 as (select 1 as n union all select 1), -- 2
  cte1 as (select a.n from cte0 a, cte0 b), -- 4
  cte2 as (select a.n from cte1 a, cte1 b), -- 16
  cte3 as (select a.n from cte2 a, cte2 b), -- 256
  cte4 as (select a.n from cte3 a, cte3 b), -- 65536
  cte5 as (select a.n from cte4 a, cte4 b) -- 4294967296
select n from cte5
go

CTEs in views. That hasn't occurred to me.

insert into orders
select top 10000 newid(), sysdatetimeoffset()
from numbers

insert into items
select top 200 newid(), ABS(CHECKSUM(NewId())) % 100 -- pseudorandom data
from numbers

insert into order_items
select order_id, item_id, price
from orders
cross join items

insert into order_discounts
select order_id, 0.01 * (ABS(CHECKSUM(NewId())) % 50), ABS(CHECKSUM(NewId())) % 10 -- pseudorandom data
from orders
go

-- I've seen too many scalar functions that look like this.
create function total_order_price
(
@order_id int
)
returns money
as
begin
declare @price money

select @price = sum(price)
from order_items
where order_id = @order_id

-- Only allow a single discount for now. This makes the example simpler, but if you really wanted to have cascading discounts, you'd need a recursive CTE to accumulate them, and I don't feel like writing one of those right now given that I have to pack for a trip tonight.
select top 1 @price = (@price * (1.0 - percent_discount)) - straight_discount
from order_discounts
where order_id = @order_id
order by order_discount_id -- Added for determinism and to ward off undefined behavior

return @price
end
go

Yeah, that's what a lot of ours do.

set statistics io on
set statistics time on

-- Naïve query
select *, dbo.total_order_price(order_id) as total_price
from orders

-- Better query
select o.*, base_order_prices.price * (1.0 - coalesce(od.percent_discount, 0)) - coalesce(od.straight_discount,0) as total_price
from orders o
left outer join order_discounts od on
(od.order_id = o.order_id)
outer apply
(
select sum(price) as price
from order_items oi
where oi.order_id = o.order_id
) as base_order_prices

How is the outer apply different from a join?

I'm reading this:

SQL Server CROSS APPLY and OUTER APPLY

Learn about SQL Server CROSS APPLY and OUTER APPLY and how they differ from regular JOINs and how to use these along with scripts and a video.

And I just don't get the difference.

Though maybe tomorrow morning I will.

Groaner

@Karla said in SQL Server Optimization Help:

CTEs in views. That hasn't occurred to me.

Quite useful. I'm only using them here to quickly expand a numbers table, but there are plenty more applications.

Yeah, that's what a lot of ours do.

It's a pretty common antipattern among a lot of imperative-minded developers getting used to SQL (DBA blogs call it the "poor man's join").

How is the outer apply different from a join?

I'm reading this:

SQL Server CROSS APPLY and OUTER APPLY

Learn about SQL Server CROSS APPLY and OUTER APPLY and how they differ from regular JOINs and how to use these along with scripts and a video.

And I just don't get the difference.

Though maybe tomorrow morning I will.

For the most part, there isn't. In the query plan, it will boil down to a join. However, APPLY lets you do a few things you can't do with a join to a derived table:

select f.*, baz.quux
from foo f
outer apply
(
  -- Notice how we're referencing outside columns as if we were in a subquery.  Can't do this with a derived table.
  select sum(b.price) * f.multiplier as quux
  from bar b
  where b.foo_id = f.foo_id
) baz

Of course, the intended purpose of APPLY is supplying a column value to a table function:

select f.*, t.*
from foo f
outer apply foo_totals(f.foo_id) t

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

CTEs in views. That hasn't occurred to me.

Quite useful. I'm only using them here to quickly expand a numbers table, but there are plenty more applications.

Yeah, that's what a lot of ours do.

It's a pretty common antipattern among a lot of imperative-minded developers getting used to SQL (DBA blogs call it the "poor man's join").

How is the outer apply different from a join?

I'm reading this:

SQL Server CROSS APPLY and OUTER APPLY

Learn about SQL Server CROSS APPLY and OUTER APPLY and how they differ from regular JOINs and how to use these along with scripts and a video.

And I just don't get the difference.

Though maybe tomorrow morning I will.

For the most part, there isn't. In the query plan, it will boil down to a join. However, APPLY lets you do a few things you can't do with a join to a derived table:
select f.*, baz.quux
from foo f
outer apply
(
  -- Notice how we're referencing outside columns as if we were in a subquery.  Can't do this with a derived table.
  select sum(b.price) * f.multiplier as quux
  from bar b
  where b.foo_id = f.foo_id
) baz

Isn't that the same as a correlated subquery?

Groaner

@Karla said in SQL Server Optimization Help:

Isn't that the same as a correlated subquery?

Similar, but not quite. TheAPPLY clause becomes a table in its own right, so you can select multiple columns and the result can have many rows. With a correlated subquery:

select f.*,
(select sum(b.baz) from bar b where b.foo_id = f.foo_id)
from foo f

...you are limited to a single column and a single row.

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

Isn't that the same as a correlated subquery?

Similar, but not quite. TheAPPLY clause becomes a table in its own right, so you can select multiple columns and the result can have many rows. With a correlated subquery:
select f.*,
(select sum(b.baz) from bar b where b.foo_id = f.foo_id)
from foo f
...you are limited to a single column and a single row.

My idea of a correlated subquery is in the where clause.

Groaner

@Karla said in SQL Server Optimization Help:

@Groaner said in SQL Server Optimization Help:
@Karla said in SQL Server Optimization Help:

Isn't that the same as a correlated subquery?

Similar, but not quite. TheAPPLY clause becomes a table in its own right, so you can select multiple columns and the result can have many rows. With a correlated subquery:
select f.*,
(select sum(b.baz) from bar b where b.foo_id = f.foo_id)
from foo f
...you are limited to a single column and a single row.
My idea of a correlated subquery is in the where clause.

Fair enough.

Karla

I couldn't seem to make it significantly faster.

I have decided to let it go. I made enough improvement.

User reported this morning the summary took 30 MINUTES!!!

I can live with 11 seconds.

dkf

@Karla Better than 150 times faster? A couple of orders of magnitude? That'll do.

Karla

@dkf said in SQL Server Optimization Help:

@Karla Better than 150 times faster? A couple of orders of magnitude? That'll do.

When I was still trying to make it better yesterday, I didn't realize it was that bad.

Karla

Now, another slow stored procedure.

This one is over 700 lines long and probably has a couple hundred sub-queries. It is painful to look at.

My boss is asking whether to assign it to the original developer.

Groaner

@Karla said in SQL Server Optimization Help:

Now, another slow stored procedure.

This one is over 700 lines long and probably has a couple hundred sub-queries. It is painful to look at.

My boss is asking whether to assign it to the original developer.

When one proc I maintained topped 1500 lines, it was time to split it into multiple procs. I think I made a thread about that proc a year or two ago....

Karla

@Groaner said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:

Now, another slow stored procedure.

This one is over 700 lines long and probably has a couple hundred sub-queries. It is painful to look at.

My boss is asking whether to assign it to the original developer.

When one proc I maintained topped 1500 lines, it was time to split it into multiple procs. I think I made a thread about that proc a year or two ago....

I've done that too.

This one...this one is only 2 statements!!!!!!

Karla

@Groaner This sproc is so bad...

select a1 = (myfunc1(id))
	, a2 = (myfunc2(id))
	, a3 = (myfunc3(id))
	, a4 = (myfunc4(id))
	, a5 = (myfunc5(id))
	, a6 = (select description1 from lookup1 l1 where t1.lookup1id = l1.id)
	--snip 80 more columns like the above
from t1 inner join t2 on t1.id = t2.id
	left join t3 on t1.id = t3.id
		and t3.version = (select max(version) from t3a where t3.id = t3a.id)
	left join t4 on t1.id = t4.id
	left join t5 on t1.id = t5.id
	left join t6 on t1.id = t6.id
	left join t7 on t1.id = t7.id
	left join t8 on t1.id = t8.id
	left join t9 on t1.id = t9.id
	left join t10 on t1.id = t10.id
	left join t11 on t1.id = t11.id
	--snip more
where groupid in (select groupid from groups)
	and catid in (select catid from categories)
  and (Case When @typeid = 1 and (t4.startdate between @date1 and @date2 ) Then @typeid
				  When @typeid = 2 and (t10.enddate Is NOT NULL) and (t10.enddate between @date1 and @date2 ) Then @typeid
				  When @typeid = 3 and  t4.startdate <= @date2 and (t10.enddate >= @date1 OR t10.enddate IS NULL) Then @typeid
				  End
				  ) > 0

union
select
	'a1 - this is the display column names--WTF???'
	, 'a2 more long text '
	---snip
	, 'a86 more long text '

Karla

@Karla Oh, and I hate SQL that is in title case.

djls45

@Karla said in SQL Server Optimization Help:

@Groaner This sproc is so bad...

[snip]
union
select
	'a1 - this is the display column names--WTF???'
	, 'a2 more long text '
	---snip
	, 'a86 more long text '

This is just to ensure that the column names will be included in the output, because the original author apparently didn't know about the "Copy with headers" option in SSMS or its keyboard shortcut Ctrl+Shift+C.

Now, how they could guarantee that this row would always be at the top......
Manual editing of the output?

Karla

@djls45 said in SQL Server Optimization Help:

@Karla said in SQL Server Optimization Help:
@Groaner This sproc is so bad...
[snip]
union
select
	'a1 - this is the display column names--WTF???'
	, 'a2 more long text '
	---snip
	, 'a86 more long text '
This is just to ensure that the column names will be included in the output, because the original author apparently didn't know about the "Copy with headers" option in SSMS or its keyboard shortcut Ctrl+Shift+C.

Now, how they could guarantee that this row would always be at the top......
Manual editing of the output?

No, the front-end turns it into Excel.

Karla

Fuck it...I can't resist, I am already rewriting it.

Now, I still am doing some crap because I don't feel like changing the front-end so I need to reproduce the result set exactly.

Karla

For a small data range (about 1000 rows) I improved the sp from almost 5 minutes to 11 seconds.

A extremely large data range (about 25K rows) took about 2 minutes.

There is still a lot of in there but it is more readable and much faster .