Representative ticket

Weng

I got an automated ticket that essentially said "Hey bozo, your really important fileserver's uptime just reset to zero". This is naturally not acceptable behavior for a production server, so I checked the event logs. Nothing except "the previous shutdown was unexpected". Off to the server team for a deeper dive, then. I asked for a detailed root cause analysis because this server is vitally important and has given us fits recently.

Closure notes: "Server is back up. Appears to have crashed and restarted."

Never mind that there's no BSOD dump and no applications running on the server except the backup agent.

chubertdev

It's an anomaly if it happens once. It's a coincidence if it happens twice. If it happens a third time, then you have a problem.

Weng

This server routinely does stupid shit.

This time, memory usage allegedly bounced to 90 percent suddenly and VMware shot it (why would you configure that? What world does that make sense in!? Is there memory pressure and on the host? Because if so, that's way overprovisioned and wrong)

90 percent memory utilization on a file server with no software but the backup agent. Gee I wonder what the problem might be!

Eldelshell

If I've learnt anything here is that either someone is tripping over the power cable or the cleaning crew is dusting the blinking buttons.

Weng

Could be. We outsourced the datacenter to a bunch of assclowns.

Zemm

@chubertdev said:

It's happenstance if it happens once. It's a coincidence if it happens twice. If it happens a third time, then you have enemy action.

FTFY

dkf

http://en.wikipedia.org/wiki/Goldfinger_(novel)