VMSS Restart Unexpectedly Causing an Unplanned Outage

Dean Ferley 0 Reputation points
2025-06-30T18:07:20.83+00:00

Hi we have two Virtual Machine Scale Sets that restarted unexpectedly yesterday evening causing an unplanned outage. What should we try to diagnose the issue, prevent it from happening, and mitigate the outage when it does happen?

Thanks!

Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
452 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Abiola Akinbade 29,490 Reputation points Volunteer Moderator
    2025-06-30T19:32:31.56+00:00

    Hello Dean Ferley,

    Thanks for your questiom. And yeah, you certainly don't want that happening again.

    I would recommend using diagnostics initially in the portal:

    1. Navigate to the impacted VM in the Azure portal.
    2. Select Diagnose and solve problems > Common problems > VM restarted or stopped unexpectedly.
    3. On the VM restarted or stopped unexpectedly page, select My resource has been stopped unexpectedly from the Tell us more about the problem you are experiencing drop-down menu.
    4. Once you select My resource has been stopped unexpectedly, the diagnostics run on the impacted VM. After the diagnostics are completed, you can check the reboot RCA information from the diagnostics result.

    You may find Kernel panics, Disk read errors or even Guest OS faults.

    The above is documented from (This link will help): https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/unexpected-vm-reboot-root-cause-analysis?source=recommendations

    You should also Review the Azure Activity Log for both VMSS instances around the time of the restart and check for any scaling events that might have triggered the restart.

    To prevent it from happening again, Ideally you should:

    Distribute your instances across Azss to reduce single-point failures.

    Are you using a Spot Vm?

    I will recommend you take a look at:

    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/understand-vm-reboot

    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/restart-stop/instances-not-repaired

    You can mark it 'Accept Answer' and 'Upvote' if this helped you

    Regards,

    Abiola


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.