7.1 Designing vSphere Host Clusters

The vSphere cluster is typically the boundary of shareable resources. Therefore, when planning the design of clusters, consider the following key design considerations:

• Capacity planning – It may be simpler to plan for growth with a small number of large clusters. However, limitations on the number of hosts per cluster, and therefore the number if virtual machines per cluster, might warrant a scale-out approach to cluster design.

• Hardware cost – Because each cluster requires a defined amount of spare resources to accommodate failures, depending on the scale of the environment, having a large number of smaller clusters will result in a higher hardware cost to virtual machine ratio.

• Security – Isolating tenants or tenant applications into dedicated clusters is one way to segment workloads and control access through role-based access control (RBAC).

• Performance – Separating tenant’s workloads or specific tenant applications into dedicated clusters provides a mechanism to verify that resources are constantly available for those consumers.

The number of hosts in a cluster will affect consolidation ratios. For example, you have eight compute nodes, and you have to decide whether to deploy two 4-node clusters or one 8-node cluster. You would have to reserve one ESXi host in each of the two 4-node clusters for HA failover to achieve N+1. To achieve the same level of availability in the 8-node cluster, you would still only need to reserve a single node, which will provide the design with one extra ESXi host for the running of virtual machine workloads.

In a production environment, always consider at least one host as a failover minimum per 8 to 10 ESXi servers to achieve N+1. Therefore, in a 16-node cluster, do not stay with only one host for failover. Aim to increase this number to two. The reason for this is that you must cover the risk of dual failure as much as possible by providing an additional node for this failover scenario, but also provide the ability to carry out maintenance on a host without a single host failure affecting the customer’s environment. The vSphere HA calculations must not be overlooked, and are detailed further in Section 8, Planning for Server Failure.

The minimum size of a cluster is two nodes for vSphere HA to protect workloads in case one host stops functioning. However, in most use cases, a 3-node cluster is far more appropriate because you have the option of running maintenance tasks on an ESXi server without having to disable HA.

Configuring large clusters has its benefits too. You will typically have a higher consolidation ratio, but they might have a downside as well if you do not have enterprise-class or correctly-sized storage in the infrastructure. Keep in mind that if a datastore is presented to a 32-node or a 64-node cluster, and if the virtual machines on that datastore are spread across the cluster, there is a chance you will run into SCSI locking contention issues. Using a VMware vSphere Storage APIs – Array Integration aware array helps reduce this problem with ATS. However, if possible, consider starting small and gradually growing the cluster size to verify that your storage behavior is not impacted.

Another situation you might encounter is having separate ESXi servers for DMZ workloads or other isolated environments. While this approach might be considered “old school,” some tenants might have security requirements or compliance requirements that require this type of architecture, which creates a physical boundary between servers, zones, or virtual machines. You might be able to use separate network cards and physical network fabric to achieve the customer’s isolation goals, but still run the workload on the same ESXi server, giving you better consolidation ratios and still ensuring the level of security required for the customer.

When hosting tenant mission-critical applications, which are of the utmost importance and must have consistent performance at all times, you might need to place them in their own dedicated cluster. Whether you place multiple different applications in the same cluster or only place, one application per cluster (the concept of an “island cluster”) will depend on the critical nature and resource requirements of the application, and possibly other factors such as isolation.

It is considered a VMware best practice to not run mixed-host clusters operating on different versions of ESXi code. However, typically during upgrade or patching activity there is likely to be a period of mixed-mode operation.