When is a table too big in SQL Server?

  • A few of my coworkers were having trouble with an online testing
    application recently.  Seems a call to a certain SP was timing out
    before the SP returned.  After some investigation (I should
    mention here that my coworkers had this project dumped in their laps
    when another group in the company forgot that their developers needed
    funding and thus had to let them go) it was discovered that this SP,
    which is just a simple "select blah1, blah2, blah3 from blahTable where
    blahPK = 'blahValue'", was selecting from a table containing nearly 44
    million rows.  Testing out the SP from SQL Query Analyzer,
    sometimes it would return acceptably quickly, other times it would take
    over 3 minutes.

    The possible reasons for this behaviour that I have heard from various people around the office are:

    1.  The table is simply too big

    2.  The table is also inserted into as frequently as it is selected from, which causes some sort of indexing screwiness

    3.  The primary key, being the combination of two varchar columns,
    one of which is 20 characters, might not have been the most efficient
    thing to index on.

    I don't have all that much database experience, and I surely don't have
    much experience in dealing with large tables.  I don't even know
    what size a table needs to be to be considered too large.  So, I'm
    wondering if 44 million rows is just too much for SQL Server to deal
    with (I'm thinking this isn't the case), and what else could be
    contributing this poor performance.  I'm not working on this
    project; my curiosity is purely academic.

  • Many claim that SQL Server is suitable for VLDB needs; and I think it's used on multi-terrabyte databased in a few systems. But I've never used SQL as a VLDB, so I can't speak to it personally.

    I'm going to assume taht running the SELECT query directly consistantly performs acceptably.

    You are likely dealing with a problem of "parameter sniffing." This is the downside of having an advanced query engine: if you pass 'dude' as the value of the @blah paramater to the procedure, then the optimizer will actually use the value ('dude') when building the query plan. Normally, this is a good thing. In some cases, it causes horrible problems.

    I've worked around this problem like this:
     CREATE PROC DoSomething (@Blah INT) AS
           DECLARE @Blah_param INT 
           SET @Blah_param = @Blah
           SELECT * FROM X WHERE asdf = @Blah_param

    That will get rid of parameter sniffing altogether. If you don't want to do that (say, 95% of your queries use 'dude' while 5% use 'sup', and the 'sup' queries take real long), then just execute WITH RECOMPILE. It'll be a bit slowerbut it'll will use parameter sniffing and give yo uthe benefits of that.

  • If there is an insert at the same time as a select you could have some locking problems.

    You can try to put the NOLOCK table hint on the select statement.

    SELECT foo FROM bar (nolock) where pk = someid

    Hope this helps

  • 44 million rows is nothing really [depending on the hardware] so it's
    not too big. [one of the best-performing tables in the system i'm
    working on now is ~200M rows]

    I would guess it's a combination of one or more of these [in this order, not knowing the system]

    1. insert(s) blocking the selects -- there are locking hints that can be used to get around this. Given the inconsistency of the bad performance and the high insert/select frequency, I'd say this is the most likely culprit. I'd be careful of going overboard in using locking hints, however, since many result in "dirty" reads - i.e., reading data that isn't committed.
    2. indexes that need rebuilding or statistics that need updating -- schedule a job to do this on the table [might have to experiment to see if weekly or daily is best.
    3. bad choice of clustered index [which could also lead to the others being a problem] - re-examine the clustered index. [note: primary key does not have to be clustered - that's just the default. All tables should have a primary key constraint and a clustered index - sometimes they are same]

Log in to reply

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.