vMotion is arguably still one of the most innovative and advanced compute features in the datacenter. For me, vMotion was the reason I pursued a career in infrastructure solutions from an early age – Witnessing a server move from one physical system to another with zero interruption to the VM is today, still something special.
What more could VMware do to improve vMotion? – Well, today vSphere 7.0 has been announced and with that comes some significant changes to that feature we have started to take for granted!
Large Workload vMotion Improvements
vMotion on VMs with large amounts of memory / “Monster” VMs have been a problem for vMotion in the past. This has usually been because of poor performance during the vMotion process for the workload, typically as a result of long stun times.
Traditionally, page tracers were used to keep track of memory pages while the VM is being migrated. Unfortunately, the installation of the page tracer stuns all vCPUs on the VM.
The whole concept of using page tracers and installing them on all vCPUs has a big performance impact on the VMs during the vMotion process.
In vSphere 7.0, this has changed. Instead of installing the page tracer on all vCPUs, it is now only installed to only one vCPU. This results in a more efficient way to perform page tracing, keeping track of memory page changes on the VM during the vMotion event.
Memory Copy Optimizations
A major part of vMotion is to copy memory changes to the new destination of the VM. Eventually, we get to a point when most of the memory changes have been copied over and the rate of change is low enough to perform a final switchover of the VM to the destination.
Part of this final switchover requires vMotion to copy all remaining memory page changes to the destination. This process works great for small/medium size VMs but for larger ones (Multi TB memory VMs), it can take 1 or 2 seconds to complete, requiring the VM to be stunned which has a noticeable impact on performance. This is especially true for databases.
When we get to this switchover phase in the past the entire memory bitmap was transferred to the destination. Now with vSphere 7.0, only a compacted memory bitmap needs to be transferred. This results in a sub-second stun time, even for the largest of workloads.
EVC – New Baselines
It is with no surprise that with a major release of vSphere, there will be support for new CPUs.
With vSphere 7.0, Intel Cascade Lake generation and AMD Zen 2 / EPYC Rome Generation EVC modes have been made available:
When is vSphere 7.0 Available?
According to the official VMware press release, the product should be available for download on May 1st 2020.
Keep an eye on the official vSphere page for more details
There are big changes to vMotion in vSphere 7.0. The changes are mostly focused on improving vMotion for larger VMs however they will also improve vMotion for smaller workloads (perhaps just not as much), and pave the way for more improvements in the future.
VMware is likely to publish a deep dive blog post about all of this if you need more detail on the process. I’ll link that here once it becomes available.