Salvaging an unresponsive Windows server


  • And then the murders began.

    @Tsaukpaetra said in Help Bites:

    Edit: Oh, you meant your current server which you don't have the password to...

    I have the password to my existing Windows Server 2016 install, I just have no way to interact with it. It refuses RDP and PowerShell remoting connections, and keyboard/mouse I just get a black screen with a mouse cursor and no way to sign in.

    I haven't tried using RSAT or MMC remotely yet. 🤔 Let me get this machine back on the network and try those.

    Wait, does that mean your file shares are also not working? 😕

    File shares are working, or were as of a few hours ago. Pretty sure AD and DNS both were too but didn't explicitly test those.

    There's no restore points so I can't roll back the OS. Can't upgrade or reinstall the OS without starting the install from within the OS.

    And I have no usable backups because I'm an idiot. (I first encountered this issue after buying a hard drive for me to do said backups, then realized I couldn't get in to set them up. I might have an old backup lying around, but if I do it's 2 years old, and it would probably be easier to just start over even if it means rebuilding the domain.)


  • Notification Spam Recipient

    So first things first, based on the information so far I surmise the following:

    • You are able to see the BIOS, but once it starts booting the screen turns black and you have no apparent output?
    • Connecting to the network shows it still offers shares, but not RDP.
    • This isn't a server board with out-of-band management (i.e. IPMI).

    If the above is correct, I assume therefore:

    • As a domain controller, it is not convinced it is in the domain network and thus the Firewall is blocking RDP in, and likely most other management ports.
    • Your monitor is not able to display the set resolution, but otherwise the OS is fine?
    • We might still be able to join a new Domain Controller, and use that to push policies back into the broken one

    First things first: Do you have a BIOS splash still, and does your monitor display "Out of Range" or similar once Windows starts?


  • And then the murders began.

    @Unperverted-Vixen File shares are still working. The server is listening on ports 53 & 389, so some semblance of DNS and AD are running.

    Can connect via MMC to some but not all items.

    • Disk Management hangs on "Connecting to Virtual Disk Service"
    • Services works.
      • The RDP service isn't running; starting it causes it to fail immediately.
    • Not able to connect to Event Viewer: febfa3eb-56d5-457b-a059-3f0c34ce944a-image.png
    • Not able to connect to Windows Firewall.
    • Not able to stop the Windows Firewall service: d1dd533a-87fc-4b71-a237-aaa183732453-image.png

  • And then the murders began.

    @Tsaukpaetra said in Salvaging an unresponsive Windows server:

    So first things first, based on the information so far I surmise the following:

    • You are able to see the BIOS, but once it starts booting the screen turns black and you have no apparent output?

    I do see the BIOS. I also see the Windows logo during the boot process. But once it gets to what would normally be the login screen. I get a black screen + a mouse cursor.

    • This isn't a server board with out-of-band management (i.e. IPMI).

    Probably not.

    First things first: Do you have a BIOS splash still, and does your monitor display "Out of Range" or similar once Windows starts?

    Yes to the BIOS, no to the out of range (I do see the mouse cursor so Windows is rendering stuff. Just... not anything useful.)


  • Notification Spam Recipient

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    But once it gets to what would normally be the login screen. I get a black screen + a mouse cursor.

    Hmm, so you're not getting the "Enter user password" login screen? So Winlogon is not working. 🤔

    Are you able to boot into Safe Mode? If you can get into the recovery console you might be able to do a

    DISM /image:C:\ /restorehealth
    

    or possibly booting from an install media and doing the same.



  • If you can get into the BIOS, what is stopping you from booting to a Windows installer on a USB stick or similar? Or some other bootable OS on such a drive, which would let you make a backup of the data on the machine at the very least.


  • I survived the hour long Uno hand

    At the risk of asking the stupid question, have you tried rebooting it via hold the power button until it turns off? A firewall error for remoting like that suggests to me it may have come up on the Public firewall profile (firewall service started before the domain fully started) and it might just need kicked.

    Doesn't really answer the whole "no local K/M working" thing, but at least covers the low hanging fruit first :mlp_shrug:


  • And then the murders began.

    @Tsaukpaetra said in Salvaging an unresponsive Windows server:

    Are you able to boot into Safe Mode?

    Yes but I get the same symptoms.

    If you can get into the recovery console you might be able to do a

    DISM /image:C:\ /restorehealth
    

    or possibly booting from an install media and doing the same.

    I can get into the recovery console (both from disk & install media), so I'll give that a try.

    I already tried the "Reset my PC" from the recovery console, but it refused to work. I've never tried that on Server before, so I don't know if that's a general Server thing, specific to this being a DC, or more signs that this machine is toast.

    @Gurth said in Salvaging an unresponsive Windows server:

    If you can get into the BIOS, what is stopping you from booting to a Windows installer on a USB stick or similar?

    I already did so - I just couldn't find anything useful to do in there. It won't let me repair the OS install, or upgrade to 2019. I could blow away the OS and start over, but as this is my one and only domain controller that's an option of last resort.

    Or some other bootable OS on such a drive, which would let you make a backup of the data on the machine at the very least.

    I took a backup of the OS drive already. I considered doing that to get the data off, but while I could see my Storage Spaces from the recovery console (which surprised me) I couldn't see my external drive. (Said drive also didn't have a letter when I first hooked it up to another machine to verify the OS drive backup image, so that might be fixed now that I gave it one on another box.) I can get data off over the network, though.

    But without being able to sign in, I can't (easily) pull drives from the storage space to add to the pool on my new NAS as I get data migrated; I'll have to get everything moved over in one shot, possibly meaning I need to buy an extra drive or two.

    (I wish I could remember for 100% certain which shares were mirrored and which weren't!)

    @izzion said in Salvaging an unresponsive Windows server:

    At the risk of asking the stupid question, have you tried rebooting it via hold the power button until it turns off? A firewall error for remoting like that suggests to me it may have come up on the Public firewall profile (firewall service started before the domain fully started) and it might just need kicked.

    I've tried remoting across two power cycles. I'll try again if DISM doesn't yield anything interesting.


  • I survived the hour long Uno hand

    @Unperverted-Vixen
    Did you recently change routers? I've seen a case where changing the MAC address of the gateway device caused a SBS to wig out and constantly come up with the Public profile (albeit that was an older version than 2016, but...).


  • Notification Spam Recipient

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    I just couldn't find anything useful to do in there.

    Yeah you're in a state where GUI quick-fix tools can't help unless you want to blow your install away.

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    this is my one and only domain controller

    Yeah, that's why Microsoft recommends you have at least two. I have three, one is on an Amazon ec2 instance connected in via VPN. Worth the $5/mo IMO.

    @izzion said in Salvaging an unresponsive Windows server:

    A firewall error for remoting like that suggests to me it may have come up on the Public firewall profile (firewall service started before the domain fully started) and it might just need kicked.

    This. I once had a scheduled task to kick itself two minutes after boot because it never could realize where it was.

    The next step if the DISM doesn't help is spinning up a temporary domain controller, trying to join it as a secondary domain controller, and adding a group policy to always allow remote desktop regardless of what network type is detected.


  • And then the murders began.

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    @Tsaukpaetra said in Salvaging an unresponsive Windows server:

    If you can get into the recovery console you might be able to do a

    DISM /image:C:\ /restorehealth
    

    or possibly booting from an install media and doing the same.

    I can get into the recovery console (both from disk & install media), so I'll give that a try.

    Had to use the letter for my OS drive in this environment (it's F instead of C for some reason), and you missed a parameter:

    DISM /image:F:\ /cleanup-image /restorehealth

    Alas, it no worky:

    Error: 0x800f081f
    
    The source files could not be found.
    Use the "Source" option to specify the location of the files that are required to restore the feature. For more information on specifying a source location, see http://go.microsoft.com/fwlink/?LinkId=243077.
    

    Tried pointing it at the install.wim file on a Windows 2016 install media, but got the same error. Unfortunately the recovery console doesn't understand Ethernet. :sadface:

    On the bright side: looking through the drives I only see the mirrored storage space and it looks like it has all the shares, so maybe I made everything mirrored, which solves my "how to get drives for the NAS" problem...


  • And then the murders began.

    @izzion said in Salvaging an unresponsive Windows server:

    Did you recently change routers? I've seen a case where changing the MAC address of the gateway device caused a SBS to wig out and constantly come up with the Public profile (albeit that was an older version than 2016, but...).

    No network changes.

    @Tsaukpaetra said in Salvaging an unresponsive Windows server:

    Yeah, that's why Microsoft recommends you have at least two. I have three, one is on an Amazon ec2 instance connected in via VPN. Worth the $5/mo IMO.

    Lesson learned. (Although with Essentials that can get tricky - I think it only allows multiple for a short period to support migrations...)

    The next step if the DISM doesn't help is spinning up a temporary domain controller, trying to join it as a secondary domain controller, and adding a group policy to always allow remote desktop regardless of what network type is detected.

    Building a new DC is on my to-do list, though that's going to have to wait for its host to arrive...


  • Grade A Premium Asshole

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    Lesson learned. (Although with Essentials that can get tricky - I think it only allows multiple for a short period to support migrations...)

    30 days, IIRC.


  • Grade A Premium Asshole

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    Building a new DC is on my to-do list, though that's going to have to wait for its host to arrive...

    Do you have any other machines that might be able to host a VM for a DC? Even 4GB of extra RAM and 60GB of hard drive space should suffice.

    Can you get a remote command line to work? Can you try bouncing the NLA service and see if the firewall profile changes?



  • @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    @Gurth said in Salvaging an unresponsive Windows server:

    If you can get into the BIOS, what is stopping you from booting to a Windows installer on a USB stick or similar?

    I already did so - I just couldn't find anything useful to do in there. It won't let me repair the OS install, or upgrade to 2019. I could blow away the OS and start over, but as this is my one and only domain controller that's an option of last resort.

    You can’t just tell the installer, “Yes, I know there’s a Windows on here already, install yourself over it but leave everything else alone?” That used to be an option — but then, the last time I installed Windows other than on a VM, it was XP on machines that wouldn’t handle anything more modern.


  • And then the murders began.

    @Gurth said in Salvaging an unresponsive Windows server:

    You can’t just tell the installer, “Yes, I know there’s a Windows on here already, install yourself over it but leave everything else alone?” That used to be an option — but then, the last time I installed Windows other than on a VM, it was XP on machines that wouldn’t handle anything more modern.

    That went away in Vista, I'm told.


  • Notification Spam Recipient

    @Gurth said in Salvaging an unresponsive Windows server:

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    @Gurth said in Salvaging an unresponsive Windows server:

    If you can get into the BIOS, what is stopping you from booting to a Windows installer on a USB stick or similar?

    I already did so - I just couldn't find anything useful to do in there. It won't let me repair the OS install, or upgrade to 2019. I could blow away the OS and start over, but as this is my one and only domain controller that's an option of last resort.

    You can’t just tell the installer, “Yes, I know there’s a Windows on here already, install yourself over it but leave everything else alone?” That used to be an option — but then, the last time I installed Windows other than on a VM, it was XP on machines that wouldn’t handle anything more modern.

    Yeah, you can, but only if you're booted into the target system. For raisins.



  • Here's an idea. Is it possible to use recovery console to enable the hidden administrator account and set it to auto-login?


  • Notification Spam Recipient

    @SirTwist said in Salvaging an unresponsive Windows server:

    Here's an idea. Is it possible to use recovery console to enable the hidden administrator account and set it to auto-login?

    The main problem is that winlogon isn't working, it's not even getting to the point of even asking for a username or password.



  • @Tsaukpaetra Yes, which is why you set it to auto, and maybe then it works because it doesn't need to show loginui. I'm not saying it will work or is even possible, it's a "try variations on this in google" suggestion.


  • And then the murders began.

    @SirTwist Interesting idea, but teh googles aren't showing me a way to do that.



  • @Unperverted-Vixen "enable administrator account from recovery console" e.g. https://www.winhelponline.com/blog/enable-built-in-administrator-windows-10-recovery-options-advanced/ for enabling, then you need to edit the registry, "enable automatic login windows" e.g. https://support.microsoft.com/en-us/help/324737/how-to-turn-on-automatic-logon-in-windows. You might need to make a .reg file on another computer and import from the command line.


  • Notification Spam Recipient

    @SirTwist In my opinion, since they still know their password it should be enough just to do the autoadminlogon thing and skip the "Enable the built-in administrator account" part.


  • :belt_onion:

    @Tsaukpaetra Plus the built-in administrator account is one-and-the-same with [DOMAIN]\Administrator, I believe, on a DC


  • And then the murders began.

    I've got my new domain controller built, and everything seems to be good (other than the fact that my other machines are all still defaulting to the old one!).

    Without access to the server, it seems like I'm going to have to delete the old server from AD (easy enough) and then do some AD metadata cleanup (which is new to me and scary-sounding).

    Is the cleanup really as scary as it sounds?

    If it is, any other way I can demote the DC without running something on the DC?

    EDIT: Actually, I think the GUI is enough:

    When you use Remote Server Administration Tools (RSAT) or the Active Directory Users and Computers console (Dsa.msc) that is included with Windows Server to delete a domain controller computer account from the Domain Controllers organizational unit (OU), the cleanup of server metadata is performed automatically. Before Windows Server 2008, you had to perform a separate metadata cleanup procedure.

    We'll find out tomorrow.


  • Notification Spam Recipient

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    Is the cleanup really as scary as it sounds?

    Not as much as it used to be. So long as the FSMO (or whatever) roles get auto-transferred you're golden.

    And the only downside if that fails is some metadata just needs to be adjusted in an obscure console.

    Edit: still think you should try to add the "allow remote desktop in the Private network firewall group" policy and see if that works.


  • And then the murders began.

    @Tsaukpaetra said in Salvaging an unresponsive Windows server:

    Not as much as it used to be. So long as the FSMO (or whatever) roles get auto-transferred you're golden.

    I already manually transferred the FSMO roles.

    Edit: still think you should try to add the "allow remote desktop in the Private network firewall group" policy and see if that works.

    The Remote Desktop service fails to start, though.


  • Notification Spam Recipient

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    Edit: still think you should try to add the "allow remote desktop in the Private network firewall group" policy and see if that works.

    The Remote Desktop service fails to start, though.

    Ah, right, and Event Log is also inexplicably broken...

    I wonder if you can manually copy the event logs (from \\SERV-AVMEDIA-03\c$\Windows\System32\winevt\Logs) and see what it says?

    In theory they're probably locked (since, you know, in use), but it might be enlightening.


  • And then the murders began.

    @Tsaukpaetra said in Salvaging an unresponsive Windows server:

    Ah, right, and Event Log is also inexplicably broken...

    Not accessible remotely. Seems to still be working, though, because...

    I wonder if you can manually copy the event logs (from \\SERV-AVMEDIA-03\c$\Windows\System32\winevt\Logs) and see what it says?

    In theory they're probably locked (since, you know, in use), but it might be enlightening.

    I can actually copy those locally. Lots of error messages in the system log, though nothing that seems actionable. This is the closest I can find to root cause:

    The LoadUserProfile call failed with the following error:
    The configuration registry database is corrupt.


  • Notification Spam Recipient

    @Unperverted-Vixen said in Salvaging an unresponsive Windows server:

    The LoadUserProfile call failed with the following error:

    Big oof, that's annoying to try recovering from enough that I usually just shove the profile folder away and start with a fresh one, copying back docs and whatever.


  • And then the murders began.

    Found out about the magic of dcdiag. New server was failing the advertising check - turned out to be due to missing SYSVOL/NETLOGON shares. Looks like all the important checks pass now (e.g. I know there were DFS replication issues in the last 24 hours, this server is < 24 hours old and DFS replication wasn't turned on at first, so I can ignore that one).

    Shut off my old server and now my machines are using the new one for login & GPOs.

    Now that things are stable, I need to a) take a backup, and b) consider decommission the domain entirely. :mlp_wut:

    (With the Essentials role deprecated, it's obviously not the way forward. Other tools - OneDrive, Microsoft account login to Windows, and my new "dumb" NAS - can do most of what I used it for. I lose automatic Folder Redirection/Offline Files for my Music folder, and a singular admin account across all my machines. And I think the former I can still do manually - with one account across two machines it won't be too hard to manage.)


  • I survived the hour long Uno hand

    @Unperverted-Vixen
    What would you be losing from the Essentials role that a standard domain wouldn't give you? Just a cost thing, or is there a feature you're missing out on that's urgent? Because you should be able to get most of the core domain features such as shared credential stuff & folder redirection via standard domain setup.


  • And then the murders began.

    @izzion Client computer backups, mainly. It also had a simpler GUI to manage stuff - e.g. I didn't need to dive into Group Policy objects to set up Folder Redirection.

    It's not that the standard version can't do everything that I need - rather, it's completely overkill for my needs, and other tools may be easier to manage.


Log in to reply