Power Stripped



  • Quote from the government hosting service that handles websites and e-mail for half the state:

    WHAT HAPPENED? Monday morning during scheduled maintenance to replace a faulty power strip, we experienced a catastrophic data loss when the faulty part tripped a breaker, sending everything into a hard fault.
    The faulty power strip was a heavy duty device that monitors the amperage draw of anything plugged into it. Replacement of the strip involved shutting down anything plugged into the strip, unplugging and replacing, then powering up the equipment. Somewhere during the power down, power up process, a breaker in the circuit panel blew, taking the HP SAN down hard. When we finally got the SAN controller back on line, we could not see any of the configured array of storage drives. Immediately, we opened a case with HP Support who confirmed our worst fears, the metadata file on the SAN was corrupted and the only recourse was to reformat and recover from backups. We chose against the recover from backup path because our backup software could not completely recover VMWare servers, only the data on the servers. Our only recourse for quick service restoration was to have our data professionally recovered. We drove the drives up to Ontrack in Minnesota Tuesday night and waited all day Wednesday for them to analyze the data and report back. As of now (Friday) the data recovery is underway.

     



  • Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    And I can only assume "power strip" is someone trying to make the email more accessible and they were really talking about a server-grade UPS.



  • @blakeyrat said:

    Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    "Circuit panel" != "power strip".

    @trainbrain27 said:

    The faulty power strip was a heavy duty device that monitors the amperage draw of anything plugged into it. Replacement of the strip involved shutting down anything plugged into the strip, unplugging and replacing, then powering up the equipment. Somewhere during the power down, power up process, a breaker in the circuit panel blew, taking the HP SAN down hard. 



  • @DaveK said:

    @blakeyrat said:

    Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    "Circuit panel" != "power strip".

    @trainbrain27 said:

    The faulty power strip was a heavy duty device that monitors the amperage draw of anything plugged into it. Replacement of the strip involved shutting down anything plugged into the strip, unplugging and replacing, then powering up the equipment. Somewhere during the power down, power up process, a breaker in the circuit panel blew, taking the HP SAN down hard. 

    breaker = breaker



  • @DaveK said:

    @blakeyrat said:
    Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    "Circuit panel" != "power strip".

    I don't understand how what you wrote is a response to what I wrote.

    Yes a circuit panel is not a power strip. Your point please?



  • @DaveK said:

    "Circuit panel" != "power strip".
     

    I'm trying to see how that is relevant. A Circuit panel doesn't deliver power, a power strip does.

    Blakey's point was that power to the SAN should have redundancy so that loss of power from one source won't mean an outage - irrespective of the cause of that power loss (circuit panel, etc).

    I'm also surprised that a hard poweroff of a SAN would also corrupt data so easily, or rather leave the SAN in a non-recoverable state. Modern filesystems have journalling to recover in the event of a power outage - I'd have thought a SAN was far more resilient.



  • We're also affected.  Down here the rumor is that several SAN drives had failed over the past year, but there was no budget to replace them, so it only took one more.

    I wonder how the budget is for data recovery and driving across the country.



  • @Ill Stew said:

    I wonder how the budget is for data recovery and driving across the country.
    A single drive costs somewhere between 200 and 1000€ (depending on size and type). Data recovery from a 4-drive RAID5 array with two failed drives starts at about 5000€ (in my particular case, the drives were actually "free", as they were still in warranty, but the persons responsible did not check them regularly, and no backups were being done).



  • @ender said:

    and no backups were being done
     

    Get said (ir)responsible person to pick up plenty of sharp objects with no first-aid kit to hand, then drive them to hospital in a car lacking bumpers.



  • @Cassidy said:

    Get said (ir)responsible person to pick up plenty of sharp objects with no first-aid kit to hand, then drive them to hospital in a car lacking bumpers.
    That particular server is in a hospital (and due to internal squabbles was confined to a really local area network - didn't have contact with any other network, or the Internet, so it couldn't send out alerts either).



  • @blakeyrat said:

    And I can only assume "power strip" is someone trying to make the email more accessible and they were really talking about a server-grade UPS.

    If the data loss was caused by a circuit breaker in the panel blowing, there obviously was no UPS involved. Anywhere.


Log in to reply