Power Stripped



  • Quote from the government hosting service that handles websites and e-mail for half the state:

    WHAT HAPPENED? Monday morning during scheduled maintenance to replace a
    faulty power strip, we experienced a catastrophic data loss when the
    faulty part tripped a breaker, sending everything into a hard fault.

    The faulty power strip was a heavy duty device that monitors the
    amperage draw of anything plugged into it. Replacement of the strip
    involved shutting down anything plugged into the strip, unplugging and
    replacing, then powering up the equipment. Somewhere during the power
    down, power up process, a breaker in the circuit panel blew, taking the
    HP SAN down hard. When we finally got the SAN controller back on line,
    we could not see any of the configured array of storage drives.
    Immediately, we opened a case with HP Support who confirmed our worst
    fears, the metadata file on the SAN was corrupted and the only recourse
    was to reformat and recover from backups. We chose against the recover
    from backup path because our backup software could not completely
    recover VMWare servers, only the data on the servers. Our only recourse
    for quick service restoration was to have our data professionally
    recovered. We drove the drives up to Ontrack in Minnesota Tuesday night
    and waited all day Wednesday for them to analyze the data and report
    back. As of now (Friday) the data recovery is underway.

     



  • Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    And I can only assume "power strip" is someone trying to make the email more accessible and they were really talking about a server-grade UPS.



  • @blakeyrat said:

    Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    "Circuit panel" != "power strip".

    @trainbrain27 said:

    The faulty power strip was a heavy duty device that monitors the amperage draw of anything plugged into it. Replacement of the strip involved shutting down anything plugged into the strip, unplugging and replacing, then powering up the equipment. Somewhere during the power down, power up process, a breaker in the circuit panel blew, taking the HP SAN down hard. 



  • @DaveK said:

    @blakeyrat said:

    Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    "Circuit panel" != "power strip".

    @trainbrain27 said:

    The faulty power strip was a heavy duty device that monitors the amperage draw of anything plugged into it. Replacement of the strip involved shutting down anything plugged into the strip, unplugging and replacing, then powering up the equipment. Somewhere during the power down, power up process, a breaker in the circuit panel blew, taking the HP SAN down hard. 

    breaker = breaker



  • @DaveK said:

    @blakeyrat said:
    Awesome.

    So either their SAN runs on a server with a single PSU, or the dual PSUs were plugged into the same breaker.

    "Circuit panel" != "power strip".

    I don't understand how what you wrote is a response to what I wrote.

    Yes a circuit panel is not a power strip. Your point please?



  • @DaveK said:

    "Circuit panel" != "power strip".
     

    I'm trying to see how that is relevant. A Circuit panel doesn't deliver power, a power strip does.

    Blakey's point was that power to the SAN should have redundancy so that loss of power from one source won't mean an outage - irrespective of the cause of that power loss (circuit panel, etc).

    I'm also surprised that a hard poweroff of a SAN would also corrupt data so easily, or rather leave the SAN in a non-recoverable state. Modern filesystems have journalling to recover in the event of a power outage - I'd have thought a SAN was far more resilient.



  • We're also affected.  Down here the rumor is that several SAN drives had failed over the past year, but there was no budget to replace them, so it only took one more.

    I wonder how the budget is for data recovery and driving across the country.



  • @Ill Stew said:

    I wonder how the budget is for data recovery and driving across the country.
    A single drive costs somewhere between 200 and 1000€ (depending on size and type). Data recovery from a 4-drive RAID5 array with two failed drives starts at about 5000€ (in my particular case, the drives were actually "free", as they were still in warranty, but the persons responsible did not check them regularly, and no backups were being done).



  • @ender said:

    and no backups were being done
     

    Get said (ir)responsible person to pick up plenty of sharp objects with no first-aid kit to hand, then drive them to hospital in a car lacking bumpers.



  • @Cassidy said:

    Get said (ir)responsible person to pick up plenty of sharp objects with no first-aid kit to hand, then drive them to hospital in a car lacking bumpers.
    That particular server is in a hospital (and due to internal squabbles was confined to a really local area network - didn't have contact with any other network, or the Internet, so it couldn't send out alerts either).



  • @blakeyrat said:

    And I can only assume "power strip" is someone trying to make the email more accessible and they were really talking about a server-grade UPS.

    If the data loss was caused by a circuit breaker in the panel blowing, there obviously was no UPS involved. Anywhere.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.