EC2 Rescue

At times when running your servers on a managed platform you don’t have access to some of the core underlying infrastructure that run your systems. The hypervisor is a prime example of this. If you’re an old school sysadmin, having access to the boot console to either configure a new disk or skip a system check  was something that was taken for granted. In a managed cloud environment like AWS, this simply isn’t an option.

What you can do is look at the  Recovery Console Screen from within the AWS console. This gives you a perfectly clear view of the current boot process. You can check this out with the following:

  • Select the instance, Action, Instance Settings, Get Instance Screenshot

Unfortunately, if the boot process gets stuck for whatever reason, you’re in a bit of a pickle. You have no way of interacting with the console.

For example, on a Windows system, the operating system may boot into the Recovery console and get stuck in the following state if you are using a customized AMI.

Note:  By default, the policy configuration for AWS-provided public Windows AMIs is set to ignoreallfailures.

Inside AWS, there isn’t an option to click Continue.

We can get around this issue however, if we follow these steps:

  1. Stop the unreachable instance;
  2. Create a snapshot of the root volume. The root volume is attached to the instance, normally as /dev/sda1.
  3. Detach the root volume from the unreachable instance, take a snapshot of the volume or create an AMI from it, and attach it to another instance in the same Availability Zone as a secondary volume. For more information, see Detaching an Amazon EBS Volume from an Instance.

Warning:

If your temporary instance is based on the same AMI that the original instance is based on, you must complete additional steps or you won’t be able to boot the original instance after you restore its root volume because of a disk signature collision. Alternatively, select a different AMI for the temporary instance. For example, if the original instance uses an AMI for Windows Server 2008 R2, launch the temporary instance using an AMI for Windows Server 2012. If you must create a temporary instance based on the same AMI, see Step 6 in Remote Desktop can’t connect to the remote computer to avoid a disk signature collision.

  1. Log in to the instance and execute the following command from a command prompt to change the bootstatuspolicyconfiguration to ignoreallfailures:

bcdedit /store Drive Letter:\boot\bcd /set {default} bootstatuspolicy ignoreallfailures

  1. Reattach the volume to the unreachable instance and start the instance again.

Another option is to use the official EC2Rescue Tool.

EC2Rescue for EC2 Windows is a convenient, straightforward, GUI-based troubleshooting tool that can be run on your Amazon EC2 Windows Server instances to troubleshoot operating system-level issues and collect advanced logs and configuration files for further analysis. EC2Rescue simplifies and expedites the troubleshooting of EC2 Windows instances.

The following are a few common issues that are addressed by EC2Rescue:

  • Instance connectivity issues due to:
    • Firewall configuration
    • RDP service configuration
    • Network interface configuration
  • Operating system (OS) boot issues due to:
    • Blue screen or stop error
    • Boot loop
    • Corrupted registry
    • Any issues that might require advanced log analysis and troubleshooting

It requires that the root volume of the affected instance be attached to another healthy EC2 Windows instance. Requirements:

  • Same availability Zone
  • Windows Server 2008 R2 or later
  • .NET Framework 3.5 SP1 or later installed
  • Is accessible from a Remote Desktop Protocol (RDP) connection
Note:  EC2Rescue can only be run on Windows Server 2008 R2 or later, but it can also analyse the offline volumes of Windows Server 2008 or later.

Steps:

  1. Attach the crashed root disk in the health Windows instance
  2. Download EC2Rescue at https://s3.amazonaws.com/ec2rescue/windows/EC2Rescue_latest.zip.
  3. Click Offline instance option

  1. Select the attached disk and click Next
  2. If the right disk is selected, click Yes

  1. Click OK
  2. Click Diagnose and Rescue option

  1. Click Next

  1. Select items to be fixed

  1. Click Rescue
  2. Click OK
  3. Click Next

13. Reattach the volume to the unreachable instance and start the instance again.

 

Disclaimer

All data and information provided on this site is for informational purposes only. The Cloudten Blog makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information on this site & will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.