Partitioning



  • I've frequently mentioned that here at WTF Inc, we have a massive, exponentially growing database of customer data. It's presently over 70 terabytes and growing at more than 4 terabytes per month. Many of our tables have almost 10 billion rows in them. There's no way to age data into offline storage as the delete-where statements simply take too long to run (then Oracle complains that its internal stats/snapshots are out of date and the delete fails).

    For years, I've been after them to partition the data so that we could simply drop the n+1st partition of each table after n months. Of course, because the geniuses who designed our database also have certain tables that can never be aged out, those tables need to be partitioned by two different levels of user number (primary and sub-partition). Of course, those tables reference data in the other tables that need to be partitioned by date. Thus, if you drop a partition (by date), the FK constraints on the user-id partitioned tables break.

    What to do?

    They really only like two choices: a) relax the FK constraint in the xref tables so that partitions in the tables partitioned by date can be dropped, or b) partition everything by some arbitrary key that has no meaning in life (just a sequence range).

    Nobody likes the idea of relaxing a constraint as it completely defeats the purpose. Guess what they decided to do.

    So now they want me to partition everything by a single sequence number range. Then, to claim that they satisfied the must-be-able-to-age-out-data requirement, they will only age-out data when we lose a client; otherwise, the data stays forever.

    So much for constraining increasing disk requirements.

    Of course, here at WTF Inc, you just know that there's more.

    Given the directive above to partition by an arbitrary sequence, we need to change about 100 tables. There is way too much data in most of those tables to simply replace them with a partitioned equivalent table, as it would take longer to copy the data (Oracle won't just add partition definitions to an unpartitioned table; it effectively copies the data to a new table) from the unpartitioned table to the partitioned table than the available window. As such, the only practical option is to map a view over the unpartitioned and partitioned tables, arbitrarily assigning the partition sequence number of zero to the data in the unpartitioned table, so you can do something like this:

      Create view x as
    Select 0 PartitionId, tu.col1, ... from table_unpartitioned tu
    Union all
    Select tp.PartitionId, tp.col1, ... from table_partitioned tp;

    ...and then just pass the partition id as zero or <value> as part of the where clause.

    Problem: another team controls 33 of those tables (even though they're in our schema), and they have no directive to make all these changes, so they don't want to spend any time doing it.

    Management's solution? Just check out a branch and make the changes you can. Then keep it on the side until the other team gets around to doing it; they'll get to it before year end.

    ...because teams always set aside directed work for work not on their to-do list.

    Also, assuming they get to it in the next ten months, the code in the branch will be ten months out of date. In and of itself, a horror of a merge. But this is WTF Inc; other members of my team are charged with restructuring our application, guaranteeing I won't even be able to merge my code with theirs.

    When I groused up the tree, they decided that perhaps they need to give it some more thought; Put the effort on the side for a while!

    How long is a while?

    We'll get back to you.

    But didn't you give us a directive that this absolutely had to be done as a high priority task?

    We need to give it some more thought.

    Sigh.



  • Considered Harmful

    @snoofle said:

    We need to give it some more thought.

    This may be the most sane sentence ever uttered at WTF Inc.

    Of course, given their track record, I'm sure they'll use the time to think up something even more absurd.



  • @snoofle said:

    they will only age-out data when we lose a client; otherwise, the data stays forever.
     

    Given the high-quality DBAs and their shenanigans, at your gig, that may actually be a workable solution.



  • @joe.edwards said:

    @snoofle said:
    We need to give it some more thought.

    This may be the most sane sentence ever uttered at WTF Inc.

     

    Unfortuntely it usually comes after a decision, not before it.

     



  • @snoofle said:

    We need to give it some more thought.
    Because up until now, you've obviously given it a great deal of whatever the opposite of thought is.


Log in to reply