Rookie Offshore DBA

snoofle

The place I work for was bought out by a large established conglomerate that, in the name of costcutting, has been hiring numerous folks (including DBAs) from, shall we say.... offshore.

I am doing a project that requires a *lot* of DB scratch space. Before I start changing tables, I want to be able to prove that what I'm about to do will actually work.

it also requires a lot of temp space. It was a fight to convince the DBAs to give us more temp space. As such, instead of:

create table ScratchCopy as select * from table x;

...we could write a loop to copy 10K rows at a time, commit, and iterate until the entire table has been copied. Why give more temp space when this will do, regardless of how much longer it takes? (Yes, I know there are limits, and bulk export/import, but this wasn't really in that range).

Ok, whatever.

The next step was to add a column to my scratch copy of the table:

alter table ScratchCopy add (NewColumn ...);

Oops, out of temp space.

Rookie offshore DBA: do the same thing you did when copying the table.

Erm, how the fuck do you only alter some of the rows in a table to have an extra column?

I wound up creating a new table, with the new column, and looping to copy the rows, 10K at a time.

Sigh.

TRWTF is that anyone runs out of space in an age where hard disks cost less than $0.10 per GB.

I've seen this problem at all the companies I work for too, and it's because apparently we like buying 200GB disks. Not fancy SSDs or 10k / 15k RPM server disks, either. Plain, vanilla, 7200RPM SATAI drives. Apparently we'd rather spend hundreds of dollars figuring out which archived crap we can delete (this costs programmer time...) than buy a new hard disk and install it.

snoofle

@DemonWasp said:

TRWTF is that anyone runs out of space in an age where hard disks cost less than $0.10 per GB.
I've seen this problem at all the companies I work for too, and it's because apparently we like buying 200GB disks. Not fancy SSDs or 10k / 15k RPM server disks, either. Plain, vanilla, 7200RPM SATAI drives. Apparently we'd rather spend hundreds of dollars figuring out which archived crap we can delete (this costs programmer time...) than buy a new hard disk and install it.

We work for a huge conglomerate, so it's not a matter of buying a 10-1Tb disks for $1000 - the disks have to be the bulletproof HA SAN variety, which ARE expensive, but still nowhere near as expensive as having me sit around for a week doing a copy that should take 2 hours.

See, I've seen multi-day delays for a huge development + QA team because our build server ran out of space on its single 200GB drive. Previous company also frequently hit delays because they only had around 1TB (across a handful of servers) for a team of 10 developers to play around in...and when you're installing 4 x 10GB database per machine just to run a full set of tests, that space can go away very quickly.

I can appreciate that maybe the hardware is expensive, and maybe it's tricky to set up right, but when it comes to developer time you should be greasing the proverbial wheels with whatever you need: hard disks, oil, blood, whatever.

The_Assimilator

Bah, you guys don't know how easy you have it.

Back in 2001, a certain company invested a massive amount of capital to create an all-encompassing software package to run their business. The hardware they chose to use for said software was (at that time) state-of-the-art Sun boxes. Long story short, the software flopped (due to them hiring cowboy coders and architecture astronauts) and almost took the company down with it. As a consequence of this, there wasn't very much goodwill, or budget, towards the IT department for a very long time.

Fast-forward to 2006 when I joined the software company that had been brought in to save the failing system. Due to the aforementioned animosity, we were stuck with the servers from 2001 with no potential for upgrades/replacements anywhere in the near future. These dual-core Sun boxes with their 120GB of SCSI storage had been the shit in 2001, but half a decade later the database was well over 100GB in size and the hardware was now just shit. Although the production server had been fortunate enough to receive a 20GB x7 RAID5 array, the remaining 4 servers (1 for training, 3 for dev) were all stuck with 20GB x6 RAID0. So if any of those 4 decided to eat a disk (which happened at least once a month), the whole array - and hence the whole database server - was down and out until (a) a replacement 20GB disk (extremely rare and expensive in 2006 and took weeks to be shipped) arrived, and (b) a technician deigned us important enough to come and get the server back up. Which entailed an entire week repartitioning the disks and reinstalling and reconfiguring the OS and the DBMS. (BTW, the DBMS was Sybase and un-coincidentally, the support techs were also from Sybase. Guess who got a fat support contract back in 2001.)

By the time I joined the training server had been stripped of most of its disks (the other machines being deemed more important) - when I left in 2009 it was completely out of commission, the chassis rotting in a basement somewhere, all of its useful hardware used for spares. The database continued to grow and eventually we started archiving data (off the live DB!) to tape, then deleting it. There were at least 4 incidents when I was there when the live DB ran out of disk space and brought the entire business to a halt until Sybase deigned us important enough to send a tech around to fix the issue.

But that's not the worst, oh no. The dev servers, now they were the worst. Because there were only 3 of them and 15 devs, there was an average of 5 devs per server. When 1 person was running an extremely intensive job on a server, no-one else could run anything. And again due to disk issus, generally only 2 of these servers was operational at any one time.

It was like a fucking Lotto. You would kick off a vital test run on a server and come back the next morning to find that not only had the server crashed and nuked your testing, there was nowhere else for you to (re-)test because the other dev servers were being used for equally important tasks. A SELECT that should have taken 5 minutes to run could take 5 hours if someone else ran something slightly intensive at the same time. We devs had to organise rosters amongst ourselves to determine who would be using what server when, what would happen in case of failure, etc. I probably spent more than 50% of my work time at that place waiting for queries to execute, and I was doing far more non-SQL work than the rst of the team!

Thankfully, the whole vile edifice was finally tossed out the window in 2009, when the company threw Sun and Sybase out the window and moved to standard x64 and MSSQL. Un-coincidentally, in the almost 2 years since then, their business has grown faster than it had in the previous 8 years since 2001. Goes to show what can happen when devs are allowed to be productive.

Nexzus

@The_Assimilator said:

Thankfully, the whole vile edifice was finally tossed out the window in 2009, when the company threw Sun and Sybase out the window and moved to standard x64 and MSSQL. Un-coincidentally, in the almost 2 years since then, their business has grown faster than it had in the previous 8 years since 2001. Goes to show what can happen when devs are allowed to be productive.

I can imagine that the same people who were so against upgrading during those 8 years are now taking credit for the recent success.("Leveraged cutting edge technology to streamline business paradigms")

havokk

Is TRWTF our inability, as an industry, to write effective cost-benefit analysis documents?

This problem should not be an issue. Either the CBA is accepted and more resource are bought, because, y'know, the benefits outweigh the costs; or the resources are not bought and managers accept that tasks will take a long time and that they have no right to complain about the slowness, because, y'know, they didn't accept the CBA.

Of course, this assumes a level of competancy of those who control the purchasing.

For example, "I probably spent more than 50% of my work time at that place waiting for queries to execute, and I was doing far more non-SQL work than the rst of the team!". A CBA might be:
This is wasting X hours of programmer time at a cost of $Y¹ to the company over the course of a year. Cost of a new server is $Z, which is less thatn $Y. Your call.

¹ I don't know what your country is like but in my country the rough rule of thumb is that an employee costs twice their salary. Someone on $13 an hour is roughly costing the company $26 every hour.

boomzilla

@snoofle said:

We work for a huge conglomerate, so it's not a matter of buying a 10-1Tb disks for $1000 - the disks have to be the bulletproof HA SAN variety, which ARE expensive, but still nowhere near as expensive as having me sit around for a week doing a copy that should take 2 hours.

Is all that really necessary for a development environment, though? Or is this another developmestuction setup?

snoofle

@boomzilla said:

@snoofle said:
We work for a huge conglomerate, so it's not a matter of buying a 10-1Tb disks for $1000 - the disks have to be the bulletproof HA SAN variety, which ARE expensive, but still nowhere near as expensive as having me sit around for a week doing a copy that should take 2 hours.
Is all that really necessary for a development environment, though? Or is this another developmestuction setup?

We have dev, qa, pre-prod, prod and dr, but only prod has a full db. Everything else has about 100-200GB of space. That's great for quick and dirty tests, but at some point, you need to scale up to see how long it will take to process the whole data set. I suggested that if the hardware were comparable (or even some known percentage comparable) with prod, then we could do scaled tests. But our non-prod hardware is about 5% of the prod hardware, and the load factors are so varied that you can't do an apples-to-apples comparison.

Basically, it's: get it to compile and work for ten rows of data, then deploy and pray it will work correctly on 3 billion rows and finish at some reasonable point. Yeah right.

Jaime

@DemonWasp said:

TRWTF is that anyone runs out of space in an age where hard disks cost less than $0.10 per GB. I've seen this problem at all the companies I work for too, and it's because apparently we like buying 200GB disks. Not fancy SSDs or 10k / 15k RPM server disks, either. Plain, vanilla, 7200RPM SATAI drives. Apparently we'd rather spend hundreds of dollars figuring out which archived crap we can delete (this costs programmer time...) than buy a new hard disk and install it.

Last project I did, the "storage team" charged the project $2,450 in labor, $3,403 in parts for 200GB of SAN storage. That works out to about $30 per GB.

blakeyrat

@Jaime said:

Last project I did, the "storage team" charged the project $2,450 in labor, $3,403 in parts for 200GB of SAN storage. That works out to about $30 per GB.

I know I bang on this rock a lot, but Amazon Cloud Services would do that for like $50/month, if that.

Jaime

@blakeyrat said:

@Jaime said:
Last project I did, the "storage team" charged the project $2,450 in labor, $3,403 in parts for 200GB of SAN storage. That works out to about $30 per GB.
I know I bang on this rock a lot, but Amazon Cloud Services would do that for like $50/month, if that.

Yup. I have a bigger problem with the labor cost than the parts. The parts are expensive because we have a state-of-the-art fancy schmancy SAN that automatically assigns the right type of storage to the right workload. We bought the fancy hardware to reduce labor costs. Yet, $2,450 is more that it would cost me to get the vendor to fly in and do it themselves, including travel expenses. Unfortunately, the storage team has the final say on the storage aspect of all projects. If we proposed Amazon, they would simply say "that's not the corporate standard".

But, the point stands. If I were snoofle's boss, I would tell him to suck it up because storage is more expensive than his time.

locallunatic

@blakeyrat said:

@Jaime said:
Last project I did, the "storage team" charged the project $2,450 in labor, $3,403 in parts for 200GB of SAN storage. That works out to about $30 per GB.

I know I bang on this rock a lot, but Amazon Cloud Services would do that for like $50/month, if that.

Unfortunately (well, really it's one of those good in theory things) not all corprate projects can use services like that for storage due to legal constraints on how data is stored. I'm at least lucky enough that while falling into that category I don't have Jaime's rediculous "cover my team's bonuses" pricing on storage.

blakeyrat

@locallunatic said:

@blakeyrat said:
@Jaime said:
Last project I did, the "storage team" charged the project $2,450 in labor, $3,403 in parts for 200GB of SAN storage. That works out to about $30 per GB.

I know I bang on this rock a lot, but Amazon Cloud Services would do that for like $50/month, if that.
Unfortunately (well, really it's one of those good in theory things) not all corprate projects can use services like that for storage due to legal constraints on how data is stored. I'm at least lucky enough that while falling into that category I don't have Jaime's rediculous "cover my team's bonuses" pricing on storage.

IIRC, you can use cloud services for HIPAA data if you use 256-bit encryption or better when shuttling it back and forth. It's been a long time since I worked in medical though-- the IBM-heavy environment didn't agree with me.

I agree it makes things more complicated, and the $50 estimate is a low-ball, but at the very least you could go to these robber barons and say, "hey Amazon can do it at a fifth your price. EXPLAIN" and put some heat on their asses. It might not help, but you'll feel better.

locallunatic

@blakeyrat said:

IIRC, you can use cloud services for HIPAA data if you use 256-bit encryption or better when shuttling it back and forth. It's been a long time since I worked in medical though-- the IBM-heavy environment didn't agree with me.

Haven't done this year's refresher yet, but part of the problem with a set up like that is there is more than just the encryption standards you need to adhear to and while if they are all properly certified then you can be too it is still risky to put up your whole business on a supplier maintaining their cert so you can have yours. But I'm not a lawyer or anything, just a dev with relatively tight abstraction layers so I don't need to directly worry about this.

snoofle

@Jaime said:

.... If I were snoofle's boss, I would tell him to suck it up because storage is more expensive than his time.

Actually, I make a pretty good buck, and there are 5 of us here making about the same money. In the time this has been dragging on, our billing alone exceeded what would have been the cost of the storage hardware spread across dev, pre-prod and qa. And we're all here less than three months.

That's just big corporation mentality.

Jaime

@snoofle said:

@Jaime said:

.... If I were snoofle's boss, I would tell him to suck it up because storage is more expensive than his time.

Actually, I make a pretty good buck, and there are 5 of us here making about the same money. In the time this has been dragging on, our billing alone exceeded what would have been the cost of the storage hardware spread across dev, pre-prod and qa. And we're all here less than three months.

That's just big corporation mentality.

Would you have one event where the savings would outweigh the cost, or does that cost build up over time to be more than the storage cost? I usually get painted into a corner when I have long term costs, but I am doing a lot of small projects. It's hard to find a budget to buy it from.

BTW, we constantly give the corporate teams a hard time over these things. Usually, we propose an alternative way to meet our needs that's 10% of the corporate cost. In order to go against the standard, we have to get an "Architecture Review Board" exception. Most of the time, the team in question simply gives us the corporate service for the alternative cost because they have a hard time defending their pricing to the ARB.

Here is a real example of the fun I have... We want to migrate about 200 users from one Active Directory domain to the corporate domain. We previously did about 30 of them one by one as they needed to be addressed (they needed VPN access, and other stuff like that). The corporate Active Directory team quoted us about $25,000 to migrate the other 170 of them. We promptly said "never mind" and had all 170 people submit help desk tickets for VPN access.

SQLDave

@The_Assimilator said:

<snip>
So if any of those 4 decided to eat a disk (which happened at least once a month), the whole array - and hence the whole database server - was down and out until (a) a replacement 20GB disk (extremely rare and expensive in 2006 and took weeks to be shipped) arrived, and (b) a technician deigned us important enough to come and get the server back up. Which entailed an entire week repartitioning the disks and reinstalling and reconfiguring the OS and the DBMS.
<snip>

So you had monthly crashes that took a month, or more, to fix?

Dorus

@SQLDave said:

So you had monthly crashes that took a month, or more, to fix?

He also wrote they had 3 servers, and on average 2 where running. Looks like those numbers add up.

The_Assimilator

@Dorus said:

@SQLDave said:
So you had monthly crashes that took a month, or more, to fix?
He also wrote they had 3 servers, and on average 2 where running. Looks like those numbers add up.

Yep, on average we had 2 of 3 servers up. When we had all 3 at a time, it was almost possible to work normally!