Persuasive Maintenance of Benelux


  • mod

    Continuing the discussion from status status status status SNAKE!!!!!!!!!!!!!!!!!!!!!!!!!:

    @abarker said:

    I think this deserves to be a SideBar now ..

    The day started out good enough. Tasks were getting completed. People were getting their work done. @abarker was performing some final pre-release testing for a set of system updates that night.

    Pretty good start to a Monday, thought @abarker.

    Then came the first email from the production system.

    A transport error has occurred when receiving results from the server.

    And then another.

    Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

    It was 11:00 in the morning, and all hell had broken loose.

    The emails started coming regularly. First one production system was affected, then two, then all of them. Investigation revealed the problem: the CIO had scheduled a third party contractor to perform maintenance on the VM hosts. During business hours. The maintenance had started at 10:00 and was to continue until 2:00.

    @abarker met with the network team to put together a plan. Since the VM hosts were in an offsite datacenter, and the maintenance was being handled by a third party technician they could only come to one conclusion: send out an enterprise email about what was happening and hunker down until the storm passed.

    To dodge the massive wave of responses, the email was sent from the help desk's address, linked to an already open ticket. Since the ticket was not assigned to anyone, nobody was bothered by the massive responses. The bunker was secure. Now it was noon, and there were only two more hours to go.

    Two o'clock rolled around, and the IT staff poked their heads out. Over 300 error emails had been received. Everyone in the company had been unable to do any real work. @abarker had not been able to complete his testing. Just one problem: it wasn't over yet. So the team took cover once again.

    An hour later, they heard from the technician:

    Good news! I just finished with the first host box! The next two should go much faster!

    He's an hour past schedule, and only 1/3 done, and that's good newsโ€ฝ thought @abarker. We're belgium-ed.

    The IT team just trudged through the rest of the day, doing their best to avoid everyone else, hoping the nightmare would end. Just before leaving the office for the day, @abarker checked his email: 590 error emails, and counting.

    The next morning, @abarker had 743 error emails. There was also a new message from technician, time-stamped at 8:00 PM the previous evening:

    I finished all the host boxes. There are a few more details I need to address in the morning, but everything should be working now.

    That's great, but I'm still getting error messages.

    Host 1 is unresponsive.

    So @abarker checked with the only network guy - James - that was in that morning. They discovered that the VMs had not been put back on the correct hosts. They hadn't even been properly distributed across the hosts. When the two looked at the resource demands, this is what they saw:

    • Host 1: ~ 50% of all VM resources
    • Host 2: ~ 35% of all VM resources
    • Host 3: ~ 15% of all VM resources

    "Well no wonder host 1 isn't responding," said James, "It's massively over-utilized."

    They immediately got in contact with the technician.

    "Oh, that shouldn't matter. I've got the hypervisor set up to load balance the hosts. You should be fine."

    "But doesn't that just balance the network traffic?" asked James.

    "Uh, yeah."

    "So shouldn't we be redistributing the VMs to balance the resource load?"

    "Huh?"

    At this point, James and @abarker decided to give up on the technician and take things to the CIO. That's when they discovered something really interesting: the technician had performed unauthorized work. He was only supposed to upgrade the BIOS on the host boxes. The hypervisor upgrades he performed were never requested.

    Now @abarker and his IT team are all left wondering: how long until this gets fixed?

    Edit: Added the morning count of error emails.


  • SockDev

    Truly, the technician is a total Belgium



  • Doubt it. Unless @abarker 's off site location is across the ocean. But that would be the real WTF.


  • mod

    @Luhmann said:

    Doubt it. Unless @abarker 's off site location is across the ocean. But that would be the real WTF.

    Nah, just a few miles from our corporate office here in Phoenix. It's mainly off site for a better internet connection, so it can serve our other sites better. But that's a different WTF.


  • SockDev

    Should I ๐Ÿšฉ for whoosh? ๐Ÿ˜œ


  • mod

    @RaceProUK said:

    Should I ๐Ÿšฉ for whoosh? ๐Ÿ˜œ

    Do I really need the additional stress right now? ๐Ÿ˜ซ (how is that tired?)


  • SockDev

    @abarker said:

    Do I really need the additional stress right now? ๐Ÿ˜ซ (how is that tired?)

    Do you respond to flags? ๐Ÿ˜œ

    I wasn't talking about flagging you anyway; I was talking about flagging @Luhmann ๐Ÿ˜‰


  • mod

    @RaceProUK said:

    Do you respond to flags? ๐Ÿ˜œ

    I wasn't talking about flagging you anyway; I was talking about flagging @Luhmann ๐Ÿ˜‰

    Oh, misunderstood. Many apologies.

    BTW, let me know when you are ready to change avatars. I now have an assortment of hats to choose[1] from!

    [1] To be clear, I get to choose, not so much you. ๐Ÿ˜‰


  • SockDev

    Candidate for front page.


  • BINNED

    @Arantor said:

    Candidate for front page.

    Needs more Hanzo.


  • SockDev

    Change @abarker to Hanzo, job done, have a half day and celebrate with a pint?


  • Grade A Premium Asshole

    Sooooo, blacklisting words does not work on topic titles? I think we need to test this with all the English expletives over on meta.d. You know...just to verify...and see where the bugs are. Yeah...that's the reason.


  • Grade A Premium Asshole

    @Arantor said:

    have a half day and celebrate with a pint?

    You don't know him at all... ๐Ÿ˜‰


  • SockDev

    I did not specify who should have the half day or the pint for that matter.


  • Grade A Premium Asshole

    I don't think we should reward the tech who caused all of this... ๐Ÿ˜›


  • SockDev

    @Polygeekery said:

    I don't think we should reward the tech who caused all of this... ๐Ÿ˜›

    Nor was I implying that this would be the case ๐Ÿ˜‰


  • Grade A Premium Asshole

    So, you are saying that you need a half day and a pint?


  • BINNED

    @Polygeekery said:

    So, you are saying that you need a half day and a pint?

    I'll take a day and half a pint. I'm humble that way.


  • Grade A Premium Asshole

    @Onyx said:

    half a pint

    That's blasphemy.

    (This is coming from a man who asks his friends, "Want to go to the pub and have a beer or twelve?")


  • mod

    @Arantor said:

    celebrate with a pint?

    Of A&W? Sounds good.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.