When is a table too big in SQL Server?
A few of my coworkers were having trouble with an online testing
application recently. Seems a call to a certain SP was timing out
before the SP returned. After some investigation (I should
mention here that my coworkers had this project dumped in their laps
when another group in the company forgot that their developers needed
funding and thus had to let them go) it was discovered that this SP,
which is just a simple "select blah1, blah2, blah3 from blahTable where
blahPK = 'blahValue'", was selecting from a table containing nearly 44
million rows. Testing out the SP from SQL Query Analyzer,
sometimes it would return acceptably quickly, other times it would take
over 3 minutes.
The possible reasons for this behaviour that I have heard from various people around the office are:
1. The table is simply too big
2. The table is also inserted into as frequently as it is selected from, which causes some sort of indexing screwiness
3. The primary key, being the combination of two varchar columns,
one of which is 20 characters, might not have been the most efficient
thing to index on.
I don't have all that much database experience, and I surely don't have
much experience in dealing with large tables. I don't even know
what size a table needs to be to be considered too large. So, I'm
wondering if 44 million rows is just too much for SQL Server to deal
with (I'm thinking this isn't the case), and what else could be
contributing this poor performance. I'm not working on this
project; my curiosity is purely academic.
Many claim that SQL Server is suitable for VLDB needs; and I think it's used on multi-terrabyte databased in a few systems. But I've never used SQL as a VLDB, so I can't speak to it personally.
I'm going to assume taht running the SELECT query directly consistantly performs acceptably.
You are likely dealing with a problem of "parameter sniffing." This is the downside of having an advanced query engine: if you pass 'dude' as the value of the @blah paramater to the procedure, then the optimizer will actually use the value ('dude') when building the query plan. Normally, this is a good thing. In some cases, it causes horrible problems.
I've worked around this problem like this:
CREATE PROC DoSomething (@Blah INT) AS
DECLARE @Blah_param INT
SET @Blah_param = @Blah
SELECT * FROM X WHERE asdf = @Blah_param
That will get rid of parameter sniffing altogether. If you don't want to do that (say, 95% of your queries use 'dude' while 5% use 'sup', and the 'sup' queries take real long), then just execute WITH RECOMPILE. It'll be a bit slowerbut it'll will use parameter sniffing and give yo uthe benefits of that.
If there is an insert at the same time as a select you could have some locking problems.
You can try to put the NOLOCK table hint on the select statement.
SELECT foo FROM bar (nolock) where pk = someid
Hope this helps
44 million rows is nothing really [depending on the hardware] so it's
not too big. [one of the best-performing tables in the system i'm
working on now is ~200M rows]
I would guess it's a combination of one or more of these [in this order, not knowing the system]
- insert(s) blocking the selects -- there are locking hints that
can be used to get around this. Given the inconsistency of the bad
performance and the high insert/select frequency, I'd say this is the
most likely culprit. I'd be careful of going overboard in using locking
hints, however, since many result in "dirty" reads - i.e., reading data
that isn't committed.
- indexes that need rebuilding or statistics that need updating --
schedule a job to do this on the table [might have to experiment to see
if weekly or daily is best.
- bad choice of clustered index [which could also lead to the others being a problem] - re-examine the clustered index. [note: primary key does not have to be clustered - that's just the default. All tables should have a primary key constraint and a clustered index - sometimes they are same]
- insert(s) blocking the selects -- there are locking hints that can be used to get around this. Given the inconsistency of the bad performance and the high insert/select frequency, I'd say this is the most likely culprit. I'd be careful of going overboard in using locking hints, however, since many result in "dirty" reads - i.e., reading data that isn't committed.