The problem with point in time snapshots and vSphere Replication

vSphere Replication is a nice, simple solution to replicate Virtual Machines from one site to another. It helps protect your business against most common threats to the data center and furthermore it’s included in vSphere standard and above at no extra cost.

Some vSphere Replication users will notice an option to enable Point in Time Recovery (PIT) on a per Virtual Machine basis. This basically means that you can have the replicated Virtual Machines maintain snapshots at regular intervals.






Why would you use PIT snapshots, the VMs are replicated?

Let us suppose that your production VMs are hit by a CryptoLocker infection and that infection has already been replicated to the DR VMs via vSphere Replication. If you have PITs enabled on those infected Virtual Machines, you can invoke DR on those VMs and revert back to a snapshot prior to the infection occurring.


Why is this a problem?

Consider the following PIT recovery configuration: Maintain 5 snapshots per day for 3 days.

This means that each of your DR VMs will have 15 snapshots. When you boot a VM, all the snapshots are scanned prior to the VM powering on and this can take some time. In a recent test I discovered that approximately 10 VMs took 1 hour to boot in total with this PIT recovery configuration enabled. In addition, the VMs were powered on at the DR site as part of an SRM recovery plan, so there was no “human time” invoked with booting the VMs. This could be seen as excessive and may not meet your RTO.



Simply reduce the number of PIT recovery snapshots or remove them completely.

Testing showed that removing the PIT recovery snapshots altogether brought the total SRM recovery plan test down from 1 hour to less then 5 minutes.

Leave a Response