In a previous job I was both desktop support and server support for a small online college that was part of a bigger college system. Each brick and mortar campus had their own IT infrastructure - switches, servers, and some of the larger ones had SANs and reliable backup systems. At the hub of these campuses was central IT. They provided shared services such as web hosting for each of the colleges' sites, email, Active Directory, licensing servers, etc. Each campus had its own servers for local things like file serving, printing, DHCP, etc.
Our office, having a small staff of 10 people but supporting an enrollment of around 2000 students and growing, was moving from the building that central IT was located in. Before that we were using their servers and such for our needs. With the move, however, we needed to secure a box for file serving, DHCP, printing, etc. So I got one, set it up, and things were going along swimmingly.
Long story short, our campus moved buildings a few times due in part to a failed merger with another division. Meanwhile, I went to work at a brick and mortar campus until they could find a new IT person. In all of this shuffle, the server I had procured was removed and was replaced with a virtual server in a data center that I had no physical or virtual access to.
I came back to the online campus just as they they were preparing to move to a building that they had secured a long-term lease with. From day one I fought to have our old server brought back for the same purposes it had served in the past: DHCP, printing, file serving, etc. This time, however, the central IT manager would not have it. "You can continue to use our services," he said, brow furrowed. "You'll be on fiber at the new location and we have this wonderful new data center that can do everything: DHCP, file serving, print serving, the whole nine yards."
I argued back. "What about redundancy? What if the data center goes down for some reason? We'll be completely offline, which is bad enough, but things are a little worse when you're an online campus."
"Nonsense!" the IT manager said. "The data center doesn't ever go down." This statement was absolutely false because it had gone down a few times while I had been at the brick and mortar campus. But this guy was the latest in a seemingly unending parade of IT managers so I gave him some slack.
Unfortunately, my supervisor ended up agreeing with the IT manager. We all came to the mutual decision though that we should have a Service Level Agreement (SLA) in place to cover our critical needs.
A few weeks later the IT manager came back with an example SLA mostly filled out. It still had the website branding from the site he had plucked it from. It was missing some essential services that we needed covered and had some services, like Wordpress, that we didn't even use.
A few days later he came back with a more polished SLA. The website name had been blotted out with some white-out (apparently he didn't know how to use page setup in his browser to eliminate that header but hey, whatever). More importantly though, we were able to get our services listed on there with three levels of support: Critical, which covered services that we had to have or we'd be dead in the water, Medium, which covered some services we could go a few days without, and Low, which covered everything else. The key part of these levels were the response times: Critical had a response time of an hour 24/7 with regular communications at 30 minutes if the issue wasn't resolved. Medium and low had more relaxed times.
It was 9:04 AM on a clear, sunny Tuesday when the data center went down. The first indication that something horrible was wrong was when our shares became disconnected and we couldn't access files. A few moments later the printers were not responding. Suddenly, internet connectivity started dwindling because the one hour lease times on the DHCP server were expiring (I know, I know - one hour was their "standard" lease time and no amount of arguing would make them budge. Seeing that many of their departments were still using mandatory static IPs because DHCP is 'inherently insecure' I wasn't going to push the issue). So now here we were, an online campus with no Internet, no file serving, no printers, no email, and no word about what was happening and when it would be fixed.
I got the Internet to limping by assigning static IPs. I created temporary connections to our networked printers to print via IP, something I should have done to begin with I guess since the office was so small. But we still had no file server access. Per our SLA we were due a phone call or some sort of communication within a few minutes of the response time.
Nothing.
So I called. "We're busy! We're working on it. Don't bother us"
"Well, what's the problem?"
"We're looking into it."
An hour goes by. I call again. Same response.
Lunch comes and goes. Still no call. I call again. "We're busy! Stop bothering us!"
"Look," I say. "We're dead in the water here without file access. Can you give me any sort of ETA or let me know what's going on?"
"No. We're busy. We're looking into it." The line went dead,
Eventually the data center comes back online around 4 PM. I go home, stopping at a liquor store to grab a six pack of cold ones to relax after a shitty day.
The next day I go to see the IT manager. He wasn't expecting me and I could tell that I was the last person he wanted to see. But I cornered him in his office and demanded to know why our SLA was ignored. "We both agreed in good faith that you would regularly call me to update me on what's happening!" I said calmly but exasperatedly. "Instead we spent all day being down and I still don't know why."
"A router went down and we had some complications replacing it." he said.
"Well, that's nice, but again, why wasn't I notified and told this? Why didn't anyone call me and keep me updated?"
The IT manager leaned back and folded his hands together. "It's just a piece of paper." he said. "I wrote it up and signed it because I thought it would make you guys feel better. Besides, what happened yesterday will never happen again. The data center usually doesn't ever go down."