VMSS Restart Unexpectedly Causing an Unplanned Outage

Dean Ferley 0 Reputation points
2025-06-30T18:07:20.83+00:00

Hi we have two Virtual Machine Scale Sets that restarted unexpectedly yesterday evening causing an unplanned outage. What should we try to diagnose the issue, prevent it from happening, and mitigate the outage when it does happen?

Thanks!

Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
454 questions
{count} votes

2 answers

Sort by: Newest
  1. Anusree Nashetty 4,785 Reputation points Microsoft External Staff Moderator
    2025-07-01T22:36:48.38+00:00

    Hi Dean Ferley,

    If Azure portal diagnostics and Resource Health show no restart events and Activity Log doesn’t indicate any action, you're most likely facing a guest OS-initiated reboot, invisible platform maintenance, or a silent internal VMSS action like reimaging or eviction due to internal health probe failure or update policy.

    For Windows VMs open Event Viewer → Windows Logs → System
    For Linux VMs Connect via SSH to the instance and check.

    Health Probe Failures (from Load Balancer or App Gateway): Go to Load Balancer → Health Probes. Check probe configuration, if a VM fails the probe, VMSS can replace it automatically.

    If VMSS is set to automatic OS image upgrades, Azure may have silently rolled instances without your action. check https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-upgrade

    0 comments No comments

  2. Abiola Akinbade 29,570 Reputation points Volunteer Moderator
    2025-06-30T19:32:31.56+00:00

    Hello Dean Ferley,

    Thanks for your questiom. And yeah, you certainly don't want that happening again.

    I would recommend using diagnostics initially in the portal:

    1. Navigate to the impacted VM in the Azure portal.
    2. Select Diagnose and solve problems > Common problems > VM restarted or stopped unexpectedly.
    3. On the VM restarted or stopped unexpectedly page, select My resource has been stopped unexpectedly from the Tell us more about the problem you are experiencing drop-down menu.
    4. Once you select My resource has been stopped unexpectedly, the diagnostics run on the impacted VM. After the diagnostics are completed, you can check the reboot RCA information from the diagnostics result.

    You may find Kernel panics, Disk read errors or even Guest OS faults.

    The above is documented from (This link will help): https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/unexpected-vm-reboot-root-cause-analysis?source=recommendations

    You should also Review the Azure Activity Log for both VMSS instances around the time of the restart and check for any scaling events that might have triggered the restart.

    To prevent it from happening again, Ideally you should:

    Distribute your instances across Azss to reduce single-point failures.

    Are you using a Spot Vm?

    I will recommend you take a look at:

    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/understand-vm-reboot

    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/restart-stop/instances-not-repaired

    You can mark it 'Accept Answer' and 'Upvote' if this helped you

    Regards,

    Abiola


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.