On-Call escalation


  • Garbage Person

    Like many software development organizations, we have an on-call rotation.A call to the on-call phone is triggered under very specific circumstances:

    1) Someone opens a sev-1 ticket with the helpdesk (sev-1 for our systems being defined as 'we will miss a shipment SLA if this is not resolved tonight'
    2) It is after 4PM Eastern and before 8AM eastern, or any time on a Saturday, Sunday or holiday. 

    Simple criteria, but most sev-1 tickets are opened erroneously after the last shipping truck left the building in a lame attempt to deflect blame from shitty production management onto some vague, ill-defined 'systems problem', but I digress.

    Anyway, the responsibility of the person on call is to answer the phone, determine if it's actually a sev-1 issue. If it is, they consult thelist of subject matter experts, find the one for that client, and wake that poor bastard up. If it's not, they just log into the helldesk software and downgrade the ticket.

    It's the only practical way to handle things since we have hundreds of highly divergent applications and expecting everybody to be able to work any of them out of the question, and expecting everybody to be on standby at all times to have their evenings ruined by the fake-sev1's at all times is even more out of the question.

     

    Anyway, the escalation process goes like this:

    If after 30 minutes, the Sev1 ticket is not being worked, or if the on-call phone is not answered on the first attempt (i.e. someone is taking a dump), the group manager is called. He has access to all the same documents and can perform the proper tasks.

    If he doesn't answer or the ticket is not being worked within 30 minutes, HIS boss is called. His boss is a senior director. His only recourse is to call the group manager and nag.

    If the call is not being worked within 30 minutes of that, the director's boss is called. The director's boss is Senior VP of IT. His only recourse is to call the director and nag (or MAYBE he can dig up the group manager's phone number).

    If the call is not being worked within 30 minutes of that, the Senior VP's boss is called. For those keeping score, we are now 2 hours into a single missed SLA worth literally dozens of dollars, and the CIO is being dragged out of bed by the helpdesk.

    If HE doesn't get things in motion within 15 minutes, the next escalation point is the President. 2 hours and fifteen minutes to wake up the guy in charge of a Fortune 500 company.

    Theoretically, there are more escalation levels, spaced out every 15 minutes beyond that, but the org chart ends there.

     

    I bring this up because of a 'hilarious' incident.The phone number for the on-call phone was changed, but one of the helpdesk jockeys had a printout of the reference sheet, instead of using the online version. So he called the wrong number. And escalated. But the manager was on vacation. And escalated. And the director was on vacation. And escalated. And the SVP didn't know who to get in touch with. And the CIO most certainly didn't. And, well, lets not talk about what happened next. 

     

    The punchline: The ticket was opened after the last 8PM delivery truck had left.



  •  Awesome! I'd kind of like to know what happened next? (how did the CEO react?)



  • @Weng said:

    Simple criteria, but most sev-1 tickets are opened erroneously after the last shipping truck left the building in a lame attempt to deflect blame from shitty production management onto some vague, ill-defined 'systems problem', but I digress.
    Yeah well... three guesses where my nickname comes from.

    The specific incident is when I was working at BT European Operations in Amsterdam (an company I can definitely recommend to avoid at all costs), and our "colleagues" in Brussels discovered, at four o'clock on a Friday afternoon, that they'd forgotten to ask for DNS for a new customer, so they opened a Severity 1 ticket with us to fix it. Which we in turn would have to request from London, all of this within the 90 minutes that were allocated for a Severity 1 issue.

    Needless to say, one of the first things I did was bump it to a Severity 3.

     



  • So at which point is Uncle Enzo forced to drop all he's doing and board his helicopter?



  • Nice story. Absurd rules, and people sticking to them ruthlessly... you'd almost say there was a moral in there somewhere.



  • AT least they didn't wake up the CEO because it would have been 10ish by the time he got the call.



  • @Severity One said:

    @Weng said:

    Simple criteria, but most sev-1 tickets are opened erroneously after the last shipping truck left the building in a lame attempt to deflect blame from shitty production management onto some vague, ill-defined 'systems problem', but I digress.
    Yeah well... three guesses where my nickname comes from.

    The specific incident is when I was working at BT European Operations in Amsterdam (an company I can definitely recommend to avoid at all costs), and our "colleagues" in Brussels discovered, at four o'clock on a Friday afternoon, that they'd forgotten to ask for DNS for a new customer, so they opened a Severity 1 ticket with us to fix it. Which we in turn would have to request from London, all of this within the 90 minutes that were allocated for a Severity 1 issue.

    Needless to say, one of the first things I did was bump it to a Severity 3.

     

     

     

     Unfortunately, a lot of the calls we receive off-hours are either salesmen who are pretty clueless and/or asking us to fix things we have no control over ("The hotel's wireless won't let me connect to the internet"), or VIPs asking us to help them troubleshoot their personal printers or home wireless networks.  My "favorite" - a sales manager called in from the west coast at 9 PM (making it 12 AM here) to ask the emergency on-call (who happened to be me that week) to request for the service desk to call him in the morning to fix his profile, as he couldn't log in to his laptop.  He specifically stated that he was tired and didn't want to be disturbed with a call back.  He went past the menu options to allow him to leave a non-priority message for the service desk so that he could leave this high-priority message with the emergency on-call.

     Of course, I put in a ticket for the service desk to call him as soon as they got in.  When I checked the ticket later in the day, I noted that the service desk personnel had called him at 7:15 AM.  That would be 4:15 AM on the west coast.  Justice was served.



  • Aren't you the one who confused a high-priority-though-not-immediate task with a low priority one? Plus, you failed to do what he specifically asked.



  •  @Zecc said:

    Aren't you the one who confused a high-priority-though-not-immediate task with a low priority one? Plus, you failed to do what he specifically asked.

    If you're responding to me, I would disagree.  The emergency on-call is stated in company policy and in the voicemail prompts to be for issues directly affecting our business-critical distribution and manufacturing processes (i.e. Priority 1 tickets).  Priority 2 tickets (such as his) and lower are supposed to leave a message for the service desk so they can handle it during business hours.  He felt that he was important enough to leave a Priority 1 message that he knew would wake someone up so they could enter a ticket for him; had he just left a message in the regular queue, it would have achieved the same end result but without a "Look at me!  I'm important!" moment.

    Actually, I followed his directions to a T.  I did not call him back when I received the message, I put in the ticket, and I informed the service desk to call him in the morning.  If he desired a later call, he should have factored in the time when making his request.  He wakes me up needlessly?  I have no issues waking him up early...

     



  • LOL Jayman. Totally deserved!

    What is really odd is that he didn't simply set an alarm to wake himself up, and then deal with it the following day. What a monumental twat.


Log in to reply