We have about 50-ish DB tables, most of which have 300-800 million rows. About 10 have > 1 billion rows. Five have > 3 billion rows. We are adding about 4TB/month. Several new aspects of our business promise to increase the rate at which we generate data that must be stored. Naturally, our users are concerned about the never ending increase in loaded storage costs.
The SAs, DBAs and my boss have finally bombarded our (extremely well educated, yet computer illiterate) users with enough demands that they HAVE to let us partition the tables so that the data can be aged and periodically moved to cheaper offline storage (within in a reasonable time frame) beyond a certain point.
We presented two options:
1. Modify the existing tables to have an unused "partitionId" column; modify the application to start generating partition id's for all new data, and insert data into both sets of tables (partitioned and unpartitioned). In the background, generate partition id's for all existing data. When that's done, copy the unpartitioned data with partition id key into the partitioned tables. Finally, drop the old tables to reclaim space. Time required: several people for one year to write all the necessary scripts, run them on modest enough chunks of data so as to keep the transaction size mangeable, and to not bury the network, incinerate the logs, and so forth.
2. Create new (empty) partitioned tables. Change the code to insert only into the partitioned tables. Create a view to let the users look up data that's in either table. Time required: 2-3 months to write and test scripts, change existing stored procs to use the new table names, and verify there are no ripple effects into other systems.
Yes, there are other variants, but these were the options that minimized special case handling.
Method 1 carries considerable cost: duplicating all the affected tables for a year. It carries considerable risk (our data is not well sanitized, so an orhpaned row could trash a large update, with massive rollbacks, etc). It will take a very long time to do all the grinding.
Method 2 carries virtually no cost (no additional storage will be required. The monthly allocation will simply go for the new tables instead of the old ones). There is virtually no risk, as any bad data in the old tables will continue to sit there, unaffected. Time to actually implement it is reduced to about an hour to run the rename scripts and install a new version of our software. All the DBAs, SAs, developers and development manageers agree method 2 is the way to go.
Our users decide they want method 1. Why? After a lot of back and forth; because we want to!
We're all banging our heads against our desks.