vSAN Two-Node Architecture Overview

As of vSAN 6.1, a new type of two-node solution has been supported, typically referred to as Remote Office/Branch Office (ROBO) environments. This solution architecture allows small office implementations to benefit from shared storage, while also minimizing cost. Prior to this, and without this specific architecture, three-node clusters remain the minimum supported configuration for vSAN enabled environments.

The two-node vSAN architecture builds on the concept of Fault Domains, first introduced in vSAN 6.0. Each of the two VMware ESXi™ hosts, located on the tenant’s premises, represent a single Fault Domain. In vSAN architecture, the objects that make up a virtual machine are typically stored in a redundant mirror across two Fault Domains, assuming the Number of Failures to Tolerate is equal to 1. As a result of this, in a scenario where one of the hosts goes offline, the virtual machines can continue to run, or be restarted, on the alternate node. To achieve this, a Witness is required to act as a tie-breaker, to achieve a quorum, and enable the surviving nodes in the cluster to restart the affected virtual machines.

However, unlike a traditional vSAN enabled cluster, where the witness objects are local to the configured cluster hosts, in a two-node architecture, the witness objects are located externally at a second site on a dedicated virtual appliance specifically configured to store metadata, and to provide the required quorum services for a host failure. In the use cases that follow, this Witness Appliance is located in the VMware Cloud Provider’s data center.

By employing a dedicated virtual appliance to provide Witness services, this architecture eliminates the need to deploy a third vSphere host on the end customer’s site, which reduces over all costs without sacrificing the availability benefits of shared storage. The Witness Appliance is actually a specially modified nested ESXi host, specifically designed to only store witness objects and cluster metadata. The Witness Appliance does not contribute to the compute and storage capacity of the solution, and cannot be employed to host virtual machines. The use of a Witness Appliance in a vSAN enabled configuration is only supported by VMware for this type of two-node architecture, and with vSAN Stretched Cluster designs.

In the following use cases, the Witness Appliance is deployed as an OVA into the service provider’s data center. As most deployments of this kind only host a small number of virtual machines at the end customer’s site, the “Tiny” configuration of two vCPUs with 8 GB of assigned memory is typically more than sufficient, allowing support for up to 750 components.

The nested ESXi vSAN Witness Appliance is automatically deployed with both flash and mechanical disks embedded, where one of the appliance’s VMDKs is tagged as a flash device during provisioning. No manual configuration is required by the service provider’s vSAN administrator. In addition, there is no requirement for a physical flash device in the vSphere host that is hosting the Witness. All of the appliance’s virtual disks can be thin provisioned, if required.

To store the required metadata, the Witness requires 16 MB of storage capacity for each Witness component stored, with one Witness component per object. Object types include:

With this in mind, the number of components stored on the Witness Appliance directly reflects the number of objects associated with the virtual machines running on the on-premises hosts. For instance, with each virtual machine requiring at least one virtual disk (VMDK), one namespace, and one swap file, the result is a minimum of three objects per virtual machine, with each snapshot also adding one object per VMDK.

Because the Witness Appliance does not host virtual machines, and therefore does not have to service virtual machine read and write requests, the network connectivity requirements between the end customer site and the service provider’s data center are minimal. Typically, a WAN interconnect with 1.5 Mb/s of available bandwidth and latency as high as 500 ms Round-Trip Time (RTT) is sufficient to provide network communication between the two-node cluster at the customer's offices and the Witness Appliance located at the service provider’s data center. However, like a traditional vSAN deployment, multicast must be enabled for communication between hosts in the two-node on-premises cluster, although there is no requirement for multicast to be enabled for WAN communication with the Witness Appliance.

The vSAN Storage-Policy Based Management (SPBM) employs a capability referred to as the Number of Failures to Tolerate (FTT), which is pertinent to this architecture. This capability provides the RAID 1 standard mirrored configuration that provides the n+1 redundancy for a virtual machine. In a vSAN two-node architecture, the FTT=1 policy is explicitly required, because there are exactly three configured Fault Domains. With this configuration applied, a mirrored copy of each virtual machine is created and automatically maintained on separate physical nodes. It is this mechanism which allows one host within this two-node architecture to fail, with users either retaining full access to the application continuously, or the workload being restarted by the vSphere HA process. The continuous availability of the application depends on whether or not it was hosted by the affected node, or configured at the application layer for high availability, and deployed and load balanced across both nodes in the cluster. Another option that provides application availability across the two-node cluster, is VMware vSphere Fault Tolerance, which is also compatible with this vSAN architecture, and can provide continuous availability to workloads with up to four virtual CPUs, in the event of a host failure.

As previously outlined, vSphere HA is a critical component of this architecture and these use cases, because shared storage is tightly integrated with the process of restarting virtual machines after a host outage. With vSphere HA enabled, if a host fails, the virtual machines that are impacted by the outage reboot on other hosts in the cluster, minimizing downtime. However, in a two-node architecture to make sure that enough CPU and memory resources are available to restart all impacted virtual machines—effectively running 100 percent of the workload on a single host—the vSphere HA Admission Control Policy must be configured to reserve 50 percent of both memory and CPU, irrespective of the amount of available storage. As a result, in this configuration only 50 percent of the compute resources in the two-node architecture is available to run workloads.