Architecting a vSphere Compute Platform : Planning for Server Failure : 8.2 vSphere Fault Tolerance
   
8.2 vSphere Fault Tolerance
vSphere Fault Tolerance (vSphere FT) provides continuous availability for applications in the event of a server failure by creating a live shadow instance of a virtual machine that is in lockstep with the primary instance. By allowing instantaneous failover between the two instances in the event of hardware failure, vSphere FT eliminates even the smallest chance of data loss or service disruption. vSphere FT automatically triggers:
Seamless stateful failover when the protected virtual machines fail to respond, providing zero downtime, zero data loss, and continuous service availability.
The creation of a new secondary virtual machine after failover, to provide continuous protection of the application.
In releases previous to vSphere 6.0, all virtual machines that were to be protected by vSphere FT were restricted to only one vCPU and a range of other limitations that inhibited adoption, including a requirement to have all of the virtual machine VMDKs configured as eager-zeroed.
These previous limitations made it impractical for use with the majority of virtual machines. However, through the development of a completely new fast checkpointing technology, vSphere FT now supports protection of virtual machines with up to 4 vCPUs and 64 GB of memory. This means that the vast majority of mission-critical tenant workloads can now be protected regardless of application or guest operating system.
In addition, VMware vSphere Storage APIs - Data Protection can now be used with virtual machines protected by vSphere FT. With 6.0, vSphere FT also enables vSphere administrators to use VMware snapshot-based tools to back up virtual machines protected by vSphere FT, enabling easier backup administration, enhanced data protection, and reduced risk.
There has also been a significant change in how vSphere FT handles storage. vSphere FT now creates a complete copy of the entire virtual machine, resulting in total protection for virtual machine storage, in addition to compute and memory. It also enables the files of the primary and secondary virtual machines to be stored on shared as well as local storage. The result is increased protection, reduced risk, and improved flexibility.
Other improvements have been made to vSphere FT virtual disk support and host compatibility requirements. Previous vSphere releases required a very specific virtual disk type—eager-zeroed thick. They also had very limiting host compatibility requirements. vSphere FT now supports all virtual disk formats—eager-zeroed thick, thick, and thin, and host compatibility for vSphere FT is now the same as for vSphere vMotion.
Symmetric Multiprocessing Fault Tolerance (SMP-FT) provides zero data loss and downtime with the ability to recover a virtual machine instantly and continue working in the case of a hardware failure.
In addition, SMP-FT:
Supports up to 4 vCPU
Supports up to 64 GB memory
Supports vMotion for both the primary and secondary virtual machines
Creates a secondary copy of VMs files and disks
Does not support user-created snapshots
Supports snapshot-based backup solutions
Creates a full copy of a VM for redundancy
Uses an XvMotion operation to create an initial copy of the VM
With SMP-FT, the primary and secondary are continuously updated to stay in sync
If the primary host crashes, the VM can be resumed on the secondary host
A 10-Gbps NIC is recommended for the SMP-FT network
Support for multiple 10-Gbps NICs on the vSphere FT network is not yet available
As before, Storage vMotion is not possible with vSphere FT running multiple vCPUs
Virtual machines in vCloud Director, vSAN, vSphere Virtual Volumes, and vSphere Replication are not supported on SMP-FT machines
The new technology used by SMP-FT is called fast checkpointing and is basically a heavily modified version of XvMotion that runs continually, and executes many more checkpoints (multiple times per second).
Figure 28. SMP Fault Tolerance – Two Complete Virtual Machines
 
Although SMP-FT seems similar to the previously available Uniprocessor FT (UP-FT), it is, in fact, a new technology only available with vSphere 6.0. However, uniprocessor virtual machines can continue to use the legacy Record-Replay FT or the new SMP-FT technology, and UP FT virtual machines can run alongside SMP-FT virtual machines without issue. The following tables examine use cases, business benefits, design requirements, and capabilities for incorporating UP-FT and SMP-FT based services as part of a VMware Cloud Provider Program platform.
 
Table 18. Symmetric Multiprocessing Fault Tolerance Design Options
Use Cases
Business Benefits
Design Requirements
Use cases include any workload that has up to 4 vCPUs and 64 GB memory that is not latency-sensitive (for instance, VOIP or high-frequency trading). There is VM/application overhead to using vSphere FT which depends on a number of factors, such as the application, number of vCPUs, number of vSphere FT protected virtual machines on a host, host processor type, and so on.
 
Protect mission-critical, high-performance applications regardless of operating system.
Continuous availability. Zero downtime and zero data loss for infrastructure failures.
Fully automated response.
SMP-FT greatly expands the use cases for vSphere FT to approximately 90 percent of workloads.
 
vSphere FT logging (traffic between hosts where primary and secondary are running) is very bandwidth intensive and will use a dedicated 10-GB network interface on each host. This is not required, but highly recommended, because at a minimum, a vSphere FT protected virtual machine will use more bandwidth. If FT does not get the bandwidth it needs, the impact is that the protected VM will run slower.
There is a limit of either 8 vCPUs or 4 vSphere FT protected VMs per host— whichever limit is reached first:
2 VMs with 4 vCPUs each (total 8 vCPUs)
4 VMs with 2 vCPUs each (total 8 vCPUs)
4 VMs with 1 vCPUs each (total 4 vCPUs)
In addition, vSphere FT now creates a second copy of the VMDKs associated with a protected virtual machine. This means that storage is now redundant (where with the previous version it used shared storage so it was not). However, this also means that storage requirements are doubled for every protected virtual machine.
 
 
Table 19. Fault Tolerance Capabilities by vSphere Version
Feature
vSphere FT (vSphere 5.5)
vSphere FT (vSphere 6.0)
vCPUs
1
4
Virtual Disks
Eagerzeroed
Any
Hot Configure FT
No
Yes
H/W Virtualization
No
Yes
Backup (Snapshot)
No
Yes
Paravirtual Devices
No
Yes
Storage Redundancy
No
Yes
vSAN / vSphere Virtual Volumes
No
No
High Availability
Yes
Yes
vSphere DRS
Partial (Initial Placement)
Partial (Initial Placement)
VMware vSphere Distributed Power Management™
Yes
Yes
VMware Site Recovery Manager™
Yes
Yes
vSphere Distributed Switch
Yes
Yes
Storage DRS
No
No
vCloud Director for Service Providers
No
No
vSphere Replication
No
No