VMSS Restart Unexpectedly Causing an Unplanned Outage

Question

VMSS Restart Unexpectedly Causing an Unplanned Outage

Dean Ferley 0

Hi we have two Virtual Machine Scale Sets that restarted unexpectedly yesterday evening causing an unplanned outage. What should we try to diagnose the issue, prevent it from happening, and mitigate the outage when it does happen?

Thanks!

Anusree Nashetty 4,710 Reputation points Microsoft External Staff Moderator

2025-07-01T22:36:48.38+00:00

Hi Dean Ferley,

If Azure portal diagnostics and Resource Health show no restart events and Activity Log doesn’t indicate any action, you're most likely facing a guest OS-initiated reboot, invisible platform maintenance, or a silent internal VMSS action like reimaging or eviction due to internal health probe failure or update policy.

For Windows VMs open Event Viewer → Windows Logs → System
For Linux VMs Connect via SSH to the instance and check.

Health Probe Failures (from Load Balancer or App Gateway): Go to Load Balancer → Health Probes. Check probe configuration, if a VM fails the probe, VMSS can replace it automatically.

If VMSS is set to automatic OS image upgrades, Azure may have silently rolled instances without your action. check https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-upgrade
Anusree Nashetty 4,710 Reputation points Microsoft External Staff Moderator

2025-07-03T18:55:56.6866667+00:00

Hi Dean Ferley,

Did you get a chance to see my response. If you have any further queries, let me know. If the information is helpful, please click on Upvote.

1 answer

Your answer

Anusree Nashetty 4,710 Reputation points Microsoft External Staff Moderator

2025-07-01T22:36:48.38+00:00

Hi Dean Ferley,

If Azure portal diagnostics and Resource Health show no restart events and Activity Log doesn’t indicate any action, you're most likely facing a guest OS-initiated reboot, invisible platform maintenance, or a silent internal VMSS action like reimaging or eviction due to internal health probe failure or update policy.

For Windows VMs open Event Viewer → Windows Logs → System
For Linux VMs Connect via SSH to the instance and check.

Health Probe Failures (from Load Balancer or App Gateway): Go to Load Balancer → Health Probes. Check probe configuration, if a VM fails the probe, VMSS can replace it automatically.

If VMSS is set to automatic OS image upgrades, Azure may have silently rolled instances without your action. check https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-automatic-upgrade
Anusree Nashetty 4,710 Reputation points Microsoft External Staff Moderator

2025-07-03T18:55:56.6866667+00:00

Hi Dean Ferley,

Did you get a chance to see my response. If you have any further queries, let me know. If the information is helpful, please click on Upvote.

Answer 1

Hello Dean Ferley,

Thanks for your questiom. And yeah, you certainly don't want that happening again.

I would recommend using diagnostics initially in the portal:

Navigate to the impacted VM in the Azure portal.
Select Diagnose and solve problems > Common problems > VM restarted or stopped unexpectedly.
On the VM restarted or stopped unexpectedly page, select My resource has been stopped unexpectedly from the Tell us more about the problem you are experiencing drop-down menu.
Once you select My resource has been stopped unexpectedly, the diagnostics run on the impacted VM. After the diagnostics are completed, you can check the reboot RCA information from the diagnostics result.

You may find Kernel panics, Disk read errors or even Guest OS faults.

The above is documented from (This link will help): https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/unexpected-vm-reboot-root-cause-analysis?source=recommendations

You should also Review the Azure Activity Log for both VMSS instances around the time of the restart and check for any scaling events that might have triggered the restart.

To prevent it from happening again, Ideally you should:

Distribute your instances across Azss to reduce single-point failures.

Are you using a Spot Vm?

I will recommend you take a look at:

https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/understand-vm-reboot

https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machine-scale-sets/restart-stop/instances-not-repaired

You can mark it 'Accept Answer' and 'Upvote' if this helped you

Regards,

Abiola

Dean Ferley 0 Reputation points

2025-06-30T19:57:55.01+00:00

Thanks, I've gone through the Azure portal diagnostics several times but it was unable to find anything.

I went to Help/Resource Health but there's no data there. It says "No resource health data right now."

Any ideas on what to try next?

Share via

VMSS Restart Unexpectedly Causing an Unplanned Outage

1 answer

Your answer