Change of Clustering Key

  • Once upon a time, in a SQL Server database, there was the following table:

    create table dbo.WidgetItems
    ID int identity(1,1),
    WidgetID int not null,
    ItemManufacturerCode varchar(50) not null,
    FrobinatorClassification varchar(50) not null,
    Foo int,
    Bar varchar(200),
    BazDate datetime,
    constraint pk_WidgetItems primary key (WidgetID, ItemManufacturerCode, FrobinatorClassification),
    constraint uq_WidgetItems unique (ID) -- this part was added later

    Obviously, this table (and most of the rest of the database) was designed by people who love natural keys, so before you start complaining about replacing ItemManufacturerCode with ItemManufacturerCodeID or something, that ain't gonna happen and is irrelevant to the issue at hand.

    The issue at hand? There are some slow-running processes that involve joins on WidgetItems.ID. Not surprisingly, switching the clustering key to that single column makes joins on it much faster (in addition to making all other indices on the table smaller). The speedup is pretty significant (cuts execution time to roughly 1/10 in my test environment). The thing is, the rest of the queries against this table are tuned for having the clustering key on the natural key columns, so I don't know whether having a massive speedup here is worth risking making a lot of important reports much slower. However, I did have a couple ideas:

    Option #1: Make clustering key (ID) and place a unique constraint on the natural key (WidgetID, ItemManufacturerCode, FrobinatorClassification).
    Pros: Smaller indexes, much faster joins on ID, textbook-correct approach
    Cons: Unknown impact on the rest of system performance. However, this could be somewhat mitigated by replacing the unique constraint with a covering unique index to reduce key lookups until Microsoft allows us to add included columns to unique constraints.

    Option #2: Keep clustering key as (WidgetID, ItemManufacturerCode, FrobinatorClassification), place covering unique index on (ID) for all columns used by the slow process (which is almost all the columns on that table).
    Pros: Original clustering key is intact, faster joins on ID (but not quite as fast as in Option #1, especially if you're updating the table)
    Cons: Bigger indices due to bigger clustering key, covering index becomes about as big as the table itself, CUD operations against the table become slower.

    The system in question is much closer to an 8/5 system than a 24/7 system, so the five or ten minutes of downtime in off hours to reindex the table either way is pretty much a non-issue.

    What do you guys think?

Log in to reply

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.