@TGV said:
@snoofle said:
And do you have any idea how long it takes to restore a 70+TB database from tape?
But, name and address data should be stored once, and I don't think sucb a table could be 70TB. So, that means that you've got all these names and addresses duplicated all over the place. Now, that might be good for performance (although I fail to see how), but that also means you can rebuild those columns from the central name/address table, which can be restored much faster. Right? Or do you only have full db restore?
Well let's think about this. An average set of customer information contains: First and Last names, address, city, state, postal code, date of birth, and some odds and ends.
Let's assume that an average first name is 7 wchars, and an average last name is 6 wchars (Yes. Unicode). Date of birth can be represented with a 4 byte int, as can postal code. I'll assume addresses average to around 30 characters, including the name of the city. Any phone number can be represented in any standard form with 20 wchars. For good measure, I'll add 30 bytes for miscellaneous data.
14 (First Name)
+12 (Last Name)
+4 (DoB)
+4 (Postal Code)
+60 (Address)
+40 (Phone Number)
+30 (Odds and ends)
=165 bytes!
So now that we have a conservative estimate of 165 bytes per customer entry, let's see how many fit in to a 70TB database.
466,459,478,450 entries.
Now obviously, this database serves many clients. If the company that snoofle works for serves 1000 different clients, that would work out to 466,459,478 entries per client, or roughly 3/2 times the population of the United States. Obviously this number becomes more reasonable the more clients you add or the more liberal you make the estimates.
Before you go on about joining similar sections of tables together, that brings with it the risk of a potential glitch sending Client A's data to Client B in corner cases.
There's also a very real chance of this database holding much more than just address data. It could just be a general data cloud that clients have access to. There's no reason why 70TB sounds unreasonable to me.