8.1.1 Determining the Number of Host Failures to Tolerate

As previously discussed, enabling vSphere HA on a cluster reserves some host resources for a vSphere HA failure event, and therefore reduces the available capacity for running virtual machines. The vCenter Server reserves sufficient unused resources in the vSphere HA cluster to support the failover capacity specified by the chosen admission control policy.

For example, the following figure shows an eight-node cluster in which you would typically reserve the equivalent of a single host’s resources as failover capacity, thus allowing for a single server failure within the cluster without impacting the performance of the virtual machines once restarted on remaining hosts.

• Host Failures Cluster Tolerates (default)

• Percentage of Cluster Resources Reserved

• Specify a Failover Host

With the default Host Failures Cluster Tolerates policy, vSphere HA performs admission control by calculating the available slot sizes. In brief, a slot is a logical representation of memory and CPU resources. By default, it is sized to satisfy the minimum requirement for any powered-on virtual machine in the cluster but can, and often should be, modified using advanced vSphere HA settings.

The Fault Domain Manager (FDM) determines how many slots each host in the cluster can hold, and calculates the Current Failover Capacity of the cluster. This determines the number of hosts that can fail in the cluster and still leave enough slots available to satisfy the virtual machines that will need to be powered on in the event of a server failure.

The FDM carries out its calculations by determining the total resource requirements for all powered-on virtual machines in the cluster. It then calculates the total number of host resources available for VMs, and finally, it calculates the current CPU failover capacity and current memory failover capacity for the cluster. If the FDM determines there is less than the percentage that is specified for the configured failover capacity, the admission control policy will be enforced. For further information about the calculations relating to these first two options, refer to VMware documentation.

Finally, with the Specify a Failover Host admission control policy, you can configure vSphere HA to designate a specific host as the failover host. With this policy, when a host fails, vSphere HA attempts to restart its virtual machines on the specified failover host, which under normal operating conditions remains unused. If vSphere HA is not able to reboot all the virtual machines of the failed server, for example if it has insufficient resources, vSphere HA attempts to restart those virtual machines on other hosts in the cluster.

Specify a Failover Host is not the most commonly used policy because it means not all hosts are being utilized, but it is sometimes seen in scenarios where customers are required to demonstrate to auditors that sufficient failover capacity exists. Also note that if this policy is used, the standby host must have the required resources to replace any host within the cluster.

Policy	Recommended Use Cases
Host Failures Cluster Tolerates admission control policy	When virtual machines have similar CPU/memory reservations and similar memory overheads.
Percentage of Cluster Resources Reserved admission control policy	When virtual machines have highly variable CPU and memory reservations.
Specify a Failover Host admission control policy	To accommodate organizational policies that dictate the use of a passive failover hosts, most typically seen with the use of virtualized business critical applications.

• It is crucial to avoid resource fragmentation, which can occur when there are enough resources in aggregate for a virtual machine to be failed over, but the resources are spread across multiple hosts, and are therefore unusable. The Host Failures Cluster Tolerates policy manages resource fragmentation by defining the slot as the maximum virtual machine reservation. The Percentage of Cluster Resources policy does not address this problem, and so it might not be appropriate in a cluster where one or two large virtual machines reside with a number of smaller virtual machines. When the policy configured is Specify a Failover Host, resources are not fragmented because the host is reserved for failover, assuming the failover host has sufficient resources to restart all of the failed host’s virtual machines.

• Consider the heterogeneity of the cluster. Service provider cloud platform clusters are typically heterogeneous in terms of the virtual machines resource required. In a heterogeneous cluster, the Host Failures Cluster Tolerates policy can be too conservative because it only considers the largest virtual machine reservations when defining slot size, and assumes the largest host will fail when computing the current failover capacity. As mentioned previously, this can and often should be modified to define a specific slot size for both CPU and memory manually. However, in a dynamic cloud environment this can be operationally difficult to calculate.

• Percentage based and dedicated failover host admission control policies are not affected by cluster heterogeneity. Because a Specify a Failover Host admission control policy requires one to four dedicated standby nodes that are not utilized during normal operations, the percentage based policy is typically recommended for service provider use. The Percentage of Cluster Resources Reserved policy also allows you to designate up to 50 percent of cluster resources for failover, while the Specify a Failover Host policy allows you to specify failover hosts. For these reasons, the recommended solution for VMware Cloud Providers is to employ the Percentage of Cluster Resources Reserved policy for admission control.

• When calculating the percentage reserved for the admission control policy, the service provider must consider the cluster size and SLAs provided to the consumers on service uptime. This typically takes into account both unplanned outages, caused by hardware failure or human error, as well as planned maintenance. The larger the cluster capacity, the larger the requirement for spare capacity, because the typical mean time to failure of a hardware component within a single cluster increases. The following table provides guidance for the percentage based admission control policy for the most commonly configured building block cluster sizes such as 8, 16, 24 nodes, and so on. For business critical production systems, VMware recommends that service providers provide a minimum of 1:8 to 1:10 reserved resource capacity for admission control. This means for a 24-node compute cluster, three nodes would provide the reserved capacity to tolerate multiple host failures, or a host failure during a maintenance period where one or more hosts were already unavailable for use (for instance, during an orchestrated patching cycle).

Avail- ability Level	Number of Nodes in vSphere Cluster
Avail- ability Level	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
N+1	N/A	N/A	33%	25%	20%	18%	15%	13%	11%	10%	9%	8%	8%	7%	7%	6%
N+2	N/A	N/A	N/A	50%	40%	33%	29%	26%	23%	20%	18%	17%	15%	14%	13%	13%
N+3	N/A	N/A	N/A	75%	60%	50%	43%	38%	33%	30%	27%	25%	23%	21%	20%	19%
N+4	N/A	N/A	N/A	N/A	80%	66%	56%	50%	46%	40%	36%	34%	30%	28%	26%	25%
	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32
N+1	6%	6%	5%	5%	5%	5%	4%	4%	4%	4%	4%	4%	3%	3%	3%	3%
N+2	12%	11%	11%	10%	10%	9%	9%	8%	8%	8%	7%	7%	7%	7%	6%	6%
N+3	18%	17%	16%	15%	14%	14%	13%	13%	12%	12%	11%	11%	10%	10%	10%	9%
N+4	24%	22%	22%	20%	20%	18%	18%	16%	16%	16%	14%	14%	14%	14%	12%	12%

• While designing for very large clusters comes with benefits, such as higher consolidation, and so on, this approach might also have a negative impact if you do not have enterprise-class or correctly sized storage in your infrastructure. Remember, if a datastore is presented to a 32-node or 64-node cluster, and if the virtual machines on that datastore are spread across the cluster, there is a possibility that there might be contention for SCSI locking. If the storage being employed supports vSphere Storage APIs – Array Integration, this is partly mitigated by ATS. However, verify that the design takes into account this possibility and that the storage performance is not impacted.

• With 64 nodes in a cluster, having such a large failure domain should mean you can increase your consolidation ratio by reducing the number of reserved hosts for admission control. However, you must provide the design allows for sufficient host failures to guarantee SLAs can be met in a multiple host failure scenario, and during host maintenance and patching.

• Although we are currently addressing vSphere HA, DRS performance is based on the DRS algorithm thread on the vCenter Server, and the amount of calculations required for resource scheduling. For large clusters, this calculation overhead will increase accordingly. Consider this as part of the design decision process.

Understanding the vSphere HA mechanism is key to a good, resilient and functioning design. The following sections provide an overview of the different vSphere HA components and the multiple complex mechanisms that maintain virtual machine availability in event of a host failure.