Why are we resuscitating this zombie machine?


  • Notification Spam Recipient

    Full story from the Status Thread:

    Status: Color me surprised. This old WinXP box I was sure was dead managed to dredge itself from the pits of Tartarus and lives again!

    Backstory: This is one of those consumer desktop machines that was found worthy of enslavementhonorably serving as a "server".
    The poor hard drive is showing 48 kb of sectors as Bad in the filesystem (Prior guy who was "managing" this box: "Don't worry, it's corrected them, I don't show anything in the diagnostics"). So it's unsurprising that it takes upwards of five minutes to load Word or Excel (though apparently Access is mostly OK because it loads via scheduler every hour, and is probably kept in memory cache or something). I suspected that there was likely more corruption, but he never saw eye-to-eye with me and I couldn't do anything more than encourage out Business Critical Processes be moved out of this PC (preferably to something more modern than Access 97). He is no longer with the company.

    We havehad a massive migration start on April 1 (Yes, this was indeed not a joke), so many systems would intentionally be down.
    Since this dumb "server" does really Bad Things when the real servers are down, it was decided to simply shut down the box for that day and bring it back up on Monday. Knowing that a full power down would likely ensure the disk would never spin up again (remember: this thing has been on 24/7 for nearly 5 years straight, excepting a few accidents), I suggested to warm reboot instead, just to check it would indeed come up before fully powering it down.

    This was a mistake, perpetrated Thursday, at 3PM by @Tsaukpaetra.

    So, what happens when you have a semi-corrupted filesystem and try to boot it? If you guessed that it doesn't boot at all, you would be wrong.
    Enough of the FS was intact for it to load ntldr and start the process of loading all the little DLLs and crap. However, one of them (Can't tell which, bootlog wasn't working for some reason) was part of the corruption, and somehow managed to execute code that hard-locked the CPU into a hard reset by the BIOS (I'm assuming). This was signified by the Windows XP logo appearing, with the animated dots parading across the screen for about half a second before the BIOS screen reappeared.

    Enough of the system was working that it detected it failed to start, and offered Safe Mode, which (surprise surprise) failed in the same manner. So was all lost? Was this machine finally dead? It sure seemed so. Luckily, they had @Tsaukpaetra, with nothing to do (the Project Managers over this great migration had intentionally left out our group when assigning out duties) and nothing to lose!

    Now, normally, our systems are fairly locked down: Can't mount arbitrary USB Storage, can't burn disks, Network is all ICEd up (and Internet is double proxied!). And of course, they recently disabled the PXE network boot servers (because apparently it's Hard to get the network to isolate unauthenticated machines in a manner that still allows basic networking protocols inside the walled garden), so using the company's internal recovery tools was out of the question.

    So what does @Tsaukpaetra do? Never give up!
    Using resources and tools gathered over the years, a mystery DVD-RW disk (that appears to hold some Access 97 backup databases from 2001), and a running Windows 7 machine, @Tsaukpaetra was able to create a minimal Windows PE 2.0 ISO, burn it to the disk, and boot the borked machine off of it, despite the challenges set up by InfoSec to keep the Average User from doing such things.

    Luckily enough, the filesystem was still recognizable enough in the PE environment that it didn't look too broken, most of the damage was in Application Data and Temp folders (at a glance anyways). So, with a chained command, a disk check was initiated and @Tsaukpaetra went home for the day (Compiling the data, searching all the desks for a writable disk, and getting the thing to boot at all meant it was now 20 minutes past quitting time). With luck, the system would be cheerfully humming at the Domain Logon screen when he came back on Monday (for he had taken Friday off to attend a family friend's wedding).

    chkdsk c: /r && chkdsk c: /r && wpeutil reboot

    Time goes by quickly and we find ourselves in the office on Monday. The borked machine was completely forgotten until a fellow coworker leaned back and asked, "Hey, @Tsaukpaetra, did we re-enable comments uploading since Friday?" at which time the events of Thursday came back with worrying force.

    So, in a half-panic hop, @Tsaukpaetra turns to the borked machine and jigs the mouse, wondering and hoping if it was working. To his surprise, it was! Still the same sluggish machine, as if nothing had happened, but @Tsaukpaetra knew better than to believe in miracles.

    Sure enough, chkdsk c: /i /c just for a quick scan revealed at least six more corrupted (i.e. now missing) file entries, 40-ish security attributes needing replacement, and that was skipping steps!

    For now, the zombie machine will be allowed to continue rotting until the somewhat-healthy bits can be translated and put onto the Real Production Server.


  • Winner of the 2016 Presidential Election Banned

    This is horrifying.


  • Notification Spam Recipient

    @Fox said in Why are we resuscitating this zombie machine?:

    This is horrifying.

    I'nit? I wasn't sure whether to just put it in General, but the Sidebar seemed more appropriate.


  • Winner of the 2016 Presidential Election Banned

    @Tsaukpaetra I almost think this needs to be frontpage, actually... But management being ridiculously stingy is nothing new for TDWTF, I guess.


  • Notification Spam Recipient

    @Fox said in Why are we resuscitating this zombie machine?:

    management being ridiculously stingy

    For once, it's not management being stingy. The root cause of us not being off of this box is actually micromanagement and :tin_foil_hat: levels of "I need to make sure your new process does things exactly like the old process, give me the codez!". Everyone wants to be off of it, but we've barely gotten Requirements approval at all over the past four years.

    That's one of the reasons an FTE was stuck "babysitting" (his words) this box to ensure it ran to completion each day. One of the WTFs was his lampshading of the hard disk's failure.

    I mean, I can literally watch the bit rot, and that's scary.


  • Garbage Person

    @Tsaukpaetra Wait, 6 borked file entries and 40 borked security attributes is bad?

    Every time I run chkdsk on my work machine, it comes back worse than that.


  • Notification Spam Recipient

    @Weng said in Why are we resuscitating this zombie machine?:

    6 borked file entries and 40 borked security attributes is bad?

    I see your mind automatically censored the part where it was reporting BAD SECTORS at the filesystem level. ;)

    @Weng said in Why are we resuscitating this zombie machine?:

    Every time I run chkdsk on my work machine, it comes back worse than that.

    😱 Does your machine experience regular brownouts, unexpected power loss, or other electrical instability issues?

    Keep in mind that this is directly after the previous chkdsk (run twice in a row!).

    If that's happening to you, I wonder how your system boots as well...


  • Discourse touched me in a no-no place

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    Everyone wants to be off of it, but we've barely gotten Requirements approval at all over the past four years.

    Make sure you've got the important data and then kill it. Simples.


  • Notification Spam Recipient

    @loopback0 said in Why are we resuscitating this zombie machine?:

    important data

    snorts haughtily Well if we were following corporate policy, it would all be on the network drives by default as a mitigation factor.

    Then again, if this machine were following corporate policy, it wouldn't exist, as XP was exiled from the company five months ago...


  • Garbage Person

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    Does your machine experience regular brownouts, unexpected power loss, or other electrical instability issues?

    It lives on a docking station (therefore the battery is fucked), hibernate is fucked because of the shitheap encryption software and I literally never shut it down - standby only. So if I don't work on a weekend, it usually runs out of juice, comes out of standby and tries to hibernate, which fails to work.

    IT doesn't believe a shitty battery is a breakfix problem until runtime is under 1hr, but you can buy a new battery out of your own budget! If you can get approval from the CIO to spend any of the money in your budget.

    Only the finest equipment for our developers.


  • area_pol

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    @Tsaukpaetra was able to create a minimal Windows PE 2.0 ISO, burn it to the disk, and boot the borked machine off of it, despite the challenges set up by InfoSec to keep the Average User from doing such things.

    Why did you need to go through all this trouble?
    Why not take the disk out, connect it to a modern computer, copy all valuable partitions to a new disk (or if you don't have a new disk, perform those repairing operations you did), then put the new/repaired disk inside the old machine?


  • Notification Spam Recipient

    @Weng said in Why are we resuscitating this zombie machine?:

    It lives on a docking station (therefore the battery is fucked)

    I try to subvert that on my machine by disconnecting it from the mains until it starts yelling at me. Hopefully that's enough. I usually get 4 hours out of it though, so it's all good.

    @Weng said in Why are we resuscitating this zombie machine?:

    until runtime is under 1hr,

    Heh, much of the toss-away portion of my fleet of laptops can last maybe half an hour? Enough to get you from one room to the next and plug in again, so it's :mu: in (some) of the user's eyes.

    @Weng said in Why are we resuscitating this zombie machine?:

    Only the finest equipment for our developers.

    You should hit up @DoctorJones, he knows the kind of equipment I regularly work with. ;)



  • 👨🏼 Apprentice! Haven't you been toying with Linux and the old fileserver hardware?
    👦🏾 (proud) Yes I managed to get a RAID working by removing some crashed disks 😄
    👨🏼 We need new network storage because the fileserver is full
    👦🏾 I don't really know why the disks crashed and anyway the disks don't have much capacity
    👨🏼 Just put these in (proffers salvaged IDE disks)
    👦🏾 I think there is just one IDE port for the CD-ROM
    👨🏼 So?
    👦🏾 We could add two at maximum and the IDE bus is probably not prepared for...
    👨🏼 Good!
    👦🏾 I don't think it would be very fast and
    👨🏼 No worries!
    👦🏾 where do we run backups?
    👨🏼 I've got copies of everything on my desktop


  • Garbage Person

    @Tsaukpaetra I mean, if we had working power plugs in our fucking conference rooms 1hr wouldn't be bad.


  • Notification Spam Recipient

    @Adynathos said in Why are we resuscitating this zombie machine?:

    Why not take the disk out, connect it to a modern computer, copy all valuable partitions to a new disk (or if you don't have a new disk, perform those repairing operations you did), then put the new/repaired disk inside the old machine?

    InfoSec gets extremely unhappy when people open their computer cases ever since a bunch of (then really expensive) 128 GB SSDs were stolen from our machines and swapped with blank spinners (usually 60 GB). They monitor that flag in the BIOS somehow, and send out nastygrams if it looks like you've tampered with your machine in any way.

    Apparently though, booting for a disk does not throw this flag however. :D

    Not to mention we (apparently) have a disk clone from a few months. Don't ask where it is or who can restore it though...

    @gleemonk said in Why are we resuscitating this zombie machine?:

    I've got copies of everything on my desktop

    :sadface: That was actually one of the backup strategies employed in this department. Literally.
    If NDA wasn't a thing, I'd show you the zombie's desktop, which proves this.

    @Weng said in Why are we resuscitating this zombie machine?:

    I mean, if we had working power plugs in our fucking conference rooms 1hr wouldn't be bad.

    We have power strips connected to 50' extension cables leading into the wall outside the conference room for this exact reason. I'll try to get a pic if you need proof. ;)


  • area_pol

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    InfoSec gets extremely unhappy

    If they are preventing your from doing your job properly, they are TRWTF

    Knowing that a full power down would likely ensure the disk would never spin up again (remember: this thing has been on 24/7 for nearly 5 years straight, excepting a few accidents), I suggested to warm reboot instead, just to check it would indeed come up before fully powering it down.

    Also, you could just let it die instead to save you the trouble :)


  • Notification Spam Recipient

    @Adynathos said in Why are we resuscitating this zombie machine?:

    doing your job

    Nope, actually, fixing computers isn't my job here, as humorous as that might be.
    Technically, I'm supposed to be a report developer, but I've wiggled my way into App dev and a few other areas.
    I also "accidentally" have local admin privileges (which is what enabled me to override the Burn CD lockout thing).

    @Adynathos said in Why are we resuscitating this zombie machine?:

    Also, you could just let it die instead to save you the trouble 😀

    FFS man, I have a reputation to uphold! You don't keep the title of "Awesome" by just letting things die without attempting!

    Trust me, when everything important is moved off, I'm personally installing a few select viruses on this thing. Actually, I kinda wonder if I could do it while it was running, I think the Antivirus has been borked for a while on it...
    Maybe I could even install 10 on it, just for the kicks once it's done! (@DoctorJones, back me up here!) I'll do it too!


  • Trolleybus Mechanic

    Seriously?

    Create a VM from it, stick it on a stable, backed up server and done.

    If InfoSec or whoever the shit they are complain, tell them too fucking bad. Then give them the dead machine to "protect".


  • Discourse touched me in a no-no place

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    Everyone wants to be off of it, but we've barely gotten Requirements approval at all over the past four years.

    At the risk of :hanzo:, what's to stop someone from provisioning a new machine with 8.1 or 10 and just copying all the software and processes to it?



  • @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    They monitor that flag in the BIOS somehow, and send out nastygrams if it looks like you've tampered with your machine in any way.
    Apparently though, booting for a disk does not throw this flag however.

    Or that old XP machine is just so old that it's not even on their radar screen.


  • Notification Spam Recipient

    @anotherusername said in Why are we resuscitating this zombie machine?:

    on their radar screen.

    Oh, it's there for sure. They complain every six months (during exception reviews).

    @Lorne-Kates said in Why are we resuscitating this zombie machine?:

    Create a VM from it, stick it on a stable, backed up server and done.

    IT tried it. It failed. Spectacularly. Apparently the dude that babysat it hacked it up too much to simply transfer to a VM without issue.

    @Lorne-Kates said in Why are we resuscitating this zombie machine?:

    If InfoSec or whoever the shit they are complain, tell them too fucking bad. Then give them the dead machine to "protect".

    You misunderstand. Nobody wants this machine to exist. InfoSec wants it gone. We want it gone. The customer wants it gone.

    The only reason it's being kept alive is that we haven't built the process to replace it, and the ones in charge of approvals (so we can start developing a replacement) aren't lifting a finger to let us do it. It's really a mixed message, don't you think?

    @FrostCat said in Why are we resuscitating this zombie machine?:

    At the risk of , what's to stop someone from provisioning a new machine with 8.1 or 10 and just copying all the software and processes to it?

    It was tried (not by me). Apparently this failed due to DLL compatibility issues, though the one who reported this back is the same one who frankenstiened the zombie to where it is now, so grain of salt. The point is: We don't want to continue using this Access 97 database, Windows XP or not (though to be fair I didn't tell y'all about the software running on the zombie box). It's a big hit on audits and a pain point all around.



  • @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    For once, it's not management being stingy. The root cause of us not being off of this box is actually micromanagement and :tin_foil_hat: levels of "I need to make sure your new process does things exactly like the old process, give me the codez!". Everyone wants to be off of it, but we've barely gotten Requirements approval at all over the past four years.

    That's one of the reasons an FTE was stuck "babysitting" (his words) this box to ensure it ran to completion each day. One of the WTFs was his lampshading of the hard disk's failure.

    I mean, I can literally watch the bit rot, and that's scary.
    I think using an old PC to serve as server is okay, but knowing it has bad harddisk but not replacing that before actually use it for anything serious, call it "disaster awaiting to happen".


  • Notification Spam Recipient

    @cheong said in Why are we resuscitating this zombie machine?:

    I think using an old PC to serve as server is okay, but knowing it has bad harddisk but not replacing that before actually use it for anything serious, call it "disaster awaiting to happen".

    Well obviously when it was first created (a half decade ago) it didn't have a HD issue, and it's been used for "serious" stuff for all its life (Or close to it. I'm pretty certain this was once the workstation of a guy only known as "nalpush").
    The problem is, it should never have happened in the first place (we have dedicated servers for this sh*t). The issue is, some smart guy was able to hack something together in Access, and added on and added on recursively until he left, and someone else added on and changed and edited, and only after years has it been noticed what an ugly hack it was.... And we've been recovering from that mistake for years.


  • Discourse touched me in a no-no place

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    The only reason it's being kept alive is that we haven't built the process to replace it, and the ones in charge of approvals (so we can start developing a replacement) aren't lifting a finger to let us do it. It's really a mixed message, don't you think?

    Quietly back it up (for your benefit, nobody else's) then reboot it repeatedly until it dies utterly. Then you can go to approvals with “we need to get a replacement for this system now because it doesn't boot at all” which is something that they can grasp. You might want to neglect to mention that you encouraged the failure mode, but it sounds like you're damn close to it anyway; once the disk starts getting things wrong every time through a boot it's time to get out of Dodge as disks never really recover.

    Modern disks do a lot of the sector remapping internally anyway. When problems start showing up, that's an indication that you've already burnt through the safety margin (unless you monitor the SMART logs properly).


  • Notification Spam Recipient

    @dkf said in Why are we resuscitating this zombie machine?:

    Modern disks do a lot of the sector remapping internally anyway.

    Exactly. The fact that the OS noticed it at all is a Bad Thing.

    @dkf said in Why are we resuscitating this zombie machine?:

    Then you can go to approvals with “we need to get a replacement for this system now because it doesn't boot at all” which is something that they can grasp.

    Did that. All that did was get us a second PC (with Windows 7, of course) that is (again, not tested by me) mostly insufficient to run the software.


  • Discourse touched me in a no-no place

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    @dkf said in Why are we resuscitating this zombie machine?:

    Then you can go to approvals with “we need to get a replacement for this system now because it doesn't boot at all” which is something that they can grasp.

    Did that. All that did was get us a second PC (with Windows 7, of course) that is (again, not tested by me) mostly insufficient to run the software.

    Well, install things on there (despite the inadequacy) and let others do the worrying. If that breaks things, you can just say to the person who complains that “Manager XYZ approved this system for this purpose therefore it must be enough and you should change your expectations”. The important thing is that it's then XYZ's problem as to why things aren't working.

    Sometimes the only way things get fixed is if the 💩 hits the fan.


  • Notification Spam Recipient

    @dkf said in Why are we resuscitating this zombie machine?:

    you should change your expectations

    Yeah... I'm not that stable in my position. Much as I'd love to say "It's broke, and until you do such and such you can't work", I don't savor being right over having a job.

    We're working quietly in an underground fashion, ready to spring up when they're in despair to deliver the new version they need (and not necessarily what they say they want).





  • @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    They monitor that flag in the BIOS somehow, and send out nastygrams if it looks like you've tampered with your machine in any way.

    Just send back a reply asking them if they've got a 27B/6.
    https://www.youtube.com/watch?v=CGeT5cutXgU



  • @dkf said in Why are we resuscitating this zombie machine?:

    When problems start showing up, that's an indication that you've already burnt through the safety margin

    Not necessarily. I've seen Windows installations brought completely to their knees by drives with only two or three sectors showing as "pending reallocation" in the SMART logs, which is where they will stay until Windows declares them unrecoverable and sticks them in its own Bad Clusters file; Windows never rewrites bad sectors, which is the only time a drive will ever remap them internally.

    Sometimes, especially for bad sectors caused by powering down during write, the drive doesn't even need to reallocate them; simply rewriting them properly is enough to make them come good.


  • Discourse touched me in a no-no place

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    You misunderstand. Nobody wants this machine to exist. InfoSec wants it gone. We want it gone. The customer wants it gone.

    Make sure the data's backed up and KILL IT. Properly. No resurrection.
    Then, suddenly, new machine!



  • Reminds me of the time I managed to revive the machine of my then-girlfriend's dad who had wanted to free some space on his hard-disk by deleting some files and decided to start with all those annoying files in the C:\ folder... among other magic tricks, it it involved guessing the identity of some undeleted files by their size in bytes.



  • @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    Then again, if this machine were following corporate policy, it wouldn't exist, as XP was exiled from the company five months ago...

    1. ONLY 5 months ago?

    2. That's a perfect excuse to kill it now. Take your opportunity as it comes.


  • Notification Spam Recipient

    @blakeyrat said in Why are we resuscitating this zombie machine?:

    ONLY 5 months ago?

    We're still getting rid of Access 97. How does this surprise? ;)

    @blakeyrat said in Why are we resuscitating this zombie machine?:

    That's a perfect excuse to kill it now. Take your opportunity as it comes.

    Busy. There will be a party though. Make sure you prepare yourself for the party by assuming the Party Escort Position.


  • 🚽 Regular

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    We're still getting rid of Access 97. How does this surprise? ;)

    We still have a business-critical Windows 95 machine. Managed to kill the OS/2 Warp one this year though.



  • @Cursorkeys Last year I helped a local plastics place replace the dead machine that used to drive their CNC routing table. Also a Windows 95 box, the job was tricky because of the undocumented proprietary ISA plugin card driving the cable to the routing table - when was the last time you saw a mobo with one of those slots on it? The design software they'd used for years and had no desire to stop using was also not compatible with any NT-based version of Windows.

    Luckily that card turned out to be nothing more complicated than 20mA current loop on the cable side and standard PC COM port UARTs on the ISA side, and the software ran OK in Windows 98SE.

    What they have now is a laptop with VirtualBox installed on it, running a Windows 98SE VM with their design software installed and using Windows file sharing to get at the design files kept on the host. The laptop has a USB to RS232 converter driving an RS232 to current loop converter to talk to the table; the USB driver makes that look like COM3 to the laptop, and VirtualBox presents that as simulated COM port UART hardware to Windows 98, so the design suite is happy with it.

    This setup is pretty future-proof. They should be able to continue using their shitty old software to drive their CNC table regardless of what happens to hardware or Windows (hell, they don't even need a Windows-based host any more, VirtualBox is cross-platform) until the table itself falls apart, which it won't do for decades; they don't build 'em like that any more.


  • BINNED

    @flabdablet said in Why are we resuscitating this zombie machine?:

    a mobo with one of those slots on it?

    2008

    They really wanted to re-use those analog Dialogic cards. 8 lines per card. 12 cards per server. So we delivered.
    Only to redo it again with 4x Diva 4PRI cards in an entry level server in 2010.


  • Notification Spam Recipient

    Update: recently the Disk Activity light has remained constantly lit, despite the system supposedly being idle. Luckily enough of the system appears to be in cache because it's still (almost, but not really) responsive.

    I'm definitely not going to attempt rebooting this thing again, it will most certainly not come back up....


  • Notification Spam Recipient

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    Update: recently the Disk Activity light has remained constantly lit, despite the system supposedly being idle. Luckily enough of the system appears to be in cache because it's still (almost, but not really) responsive.

    I'm definitely not going to attempt rebooting this thing again, it will most certainly not come back up....

    Managed to finish a disk scan even faster this time! Previous was > 3700 hours (but for raisins the screenshot didn't save properly)

    0_1470782564096_HDTune_Error_Scan_ST320LT007-9ZV142.png



  • I worked at a major at a few major gaming companies and there was a server that was using an easy install LAMP distro that was 10 years out of date and it was redirecting a lot of traffic and dealing with affiliate link attribution (basically you need to pay these people correctly otherwise they won't help you make cash).

    I took me 2 fucking years to get that shit moved over to AWS.


  • Notification Spam Recipient

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    Update: recently the Disk Activity light has remained constantly lit, despite the system supposedly being idle. Luckily enough of the system appears to be in cache because it's still (almost, but not really) responsive.

    I'm definitely not going to attempt rebooting this thing again, it will most certainly not come back up....

    Managed to finish a disk scan even faster this time! Previous was > 3700 hours (but for raisins the screenshot didn't save properly)

    0_1470782564096_HDTune_Error_Scan_ST320LT007-9ZV142.png

    So, we moved locations over the weekend, which necessitated of course powering down things (because it's Hard to keep things turned on when moving them from a stationary wall to another wall five miles down the road).

    The poor thing didn't come back up.

    And we're getting questions on why a certain business important process (that was the only thing still running on that machine) isn't running anymore.

    My response?

    0_1471894236522_upload-4a547f64-744a-4fa9-baa3-7b0f5b0427d8

    Note that the Online presence was my supervisor. Everyone else is apparently not using the company chat software.


  • Discourse touched me in a no-no place

    @Tsaukpaetra said in Why are we resuscitating this zombie machine?:

    the company chat software.

    Lync Skype for Business is really awful. Best avoided. Go with that majority…