VMwarevSphere

vSphere 6.7 – Update Manager – ESXi Upgrade Enhancements (No hardware reboots)

So vSphere 6.7 has been announced but what does this mean for Update Manager?

Single reboot on upgrades

Historically (Prior to 6.7) when upgrading a host Update Manager will trigger a reboot to prepare the host and then the installer will trigger another upgrade. These reboots add time to the maintenance tasks and increase risk to the availability of VMs. This is especially true when you only have N+1 hosts in your environment since you have no available hosts for failover during maintenance

As of 6.7, VMware has changed this process so that only the installer triggers a reboot into the new version of ESXi.  VUM simply needs to be running version 6.7 (Now embedded in vCenter Server Appliance)

Example

If VUM is at version 6.7 then upgrading ESXi from 6.5 to 6.7 (with VUM) will trigger only a single reboot

 

Quick Reboot

No hardware reboots when upgrading from 6.7 to a higher version.

This is achieved by initiating an OS level restart after upgrading the software instead of a full hardware / BIOS / Firmware initialization

Hardware needs to be whitelisted for quick boot. Dell R630, Dell R640 & HP DL360 G9 G10 & HP DL380 G9 G10 are supported at this time

This saves significant time on host upgrades!

Example

Upgrading ESXi from 6.7 to 6.7.x will require no hardware level reboot if the hardware is on the whitelist & quick boot is selected in VUM remediation. Visit the official documentation for full prerequisites.

 

Why is this important?

We’ve shown that changes from 6.5 to 6.7 have removed one of two reboots. At the time time, upgrading from 6.7 to 6.7.x can be conducted with no hardware level reboots (on supported hardware)

Essentially the risk of another host failing during maintenance is reduced, this is because the maintenance window is reduced. This is especially important with HCI solutions that include vSAN. On the vSAN front, if you suffer a failure while a host is in maintenance mode (with ensure accessibility) then objects with a policy of FTT=1 will also fail. Reducing the time a host is in maintenance mode will, therefore, reduce your exposure to this kind of issue. Of course, you should do a full evacuation or use FTT=2 to mitigate these risks further.