Architecting VMware vSAN 6.2 : Eight Common Service Provider Use Cases : 7.3 vSAN Stretched Cluster Deployments
   
7.3 vSAN Stretched Cluster Deployments
Stretched storage with vSAN allows you to split the vSAN cluster across two sites, so that if a site fails, providers are able to seamlessly fail over to the other site without any loss of data. vSAN in a stretched storage deployment accomplishes this by synchronously mirroring data across the two sites. The failover is initiated by a witness VM that resides in a central location, accessible from both sites. This is a specific implementation for environments where disaster avoidance and unscheduled downtime are key requirements.
A vSAN stretched cluster is a specific deployment where the provider sets up a vSAN cluster with two disparate Active/Active sites with an identical number of ESXi hosts distributed evenly between the two sites. The witness host resides at a third site and the sites are connected by way of a high-bandwidth, low-latency link. The third site, hosting the witness host, is connected to both of the Active/Active data sites. The sites can be a combination of the VMware Cloud Provider Program, customer, and third-party data centers. See the following figure for an example.
Figure 31. VMware Cloud Provider Program Stretched Cluster Example
 
In a vSAN stretched cluster implementation, each site is configured as a vSAN fault domain, and there is only one witness host in any configuration. Each site can be considered a fault domain and a maximum of three sites (two data, one witness) is supported. For deployments that manage multiple stretched clusters, each cluster must have its own unique witness host.
A virtual machine deployed on a vSAN stretched cluster has one copy of its data on site A, a second copy of its data on site B, while any witness components are placed on the witness host in site C. This configuration is achieved through fault domains and affinity rules. In the event of a complete site failure, there is a full copy of the virtual machine data available, as well as greater than 50 percent of the components. This allows the virtual machine to remain available on the vSAN datastore. If the virtual machine needs to be restarted on the other data site, vSphere HA accommodates this task.
Geographic distance is, in theory, not a concern when designing a vSAN stretched cluster. The key requirement is the latency between the respective sites. VMware requires a maximum latency of no more than 5 ms RTT (Round-Trip Time) between data sites and no more than 200 ms RTT between data sites and the witness host. As long as the latency requirements are met, there is no restriction on geographic distance.
As discussed earlier, vSAN stretched cluster requires three disparate sites, and each site must communicate on the management, vSAN, VM, and vSphere vMotion networks. Detailed networking design is beyond the scope of this document. However, to minimize uncertainty with the implementation, VMware recommends that providers implement a stretched L2 between the data sites, and a L3 configuration between the data sites and the witness site.
In the example illustrated in the following figure, the choice was made to use a virtual witness connected over L3 with static routes. The witness is deployed on a physical ESXi host with two preconfigured networks for the management and vSAN networks, respectively. The data sites are connected by way of stretched L2 which backs the management, vSAN, VM, and vSphere vMotion networks. All hosts in the cluster must be able to successfully communicate, and to facilitate this communication, static routes must be configured (per host) between the data hosts in Site A and B, and the witness host in Site C, for vSAN traffic to flow between the data sites and witness site.
Figure 32. Network Connectivity for Stretched Cluster
 
Ultimately, the success and design considerations of a specific vSAN stretched cluster implementation depend upon many factors ranging from choice of topology to the physical capabilities of a provider’s networking infrastructure.
With vSAN 6.2, stretched clusters have been enhanced to simplify the creation of the configuration. A new graphical configuration wizard assists with the configuration as appropriate. vSAN stretched clustering is a specific configuration implemented in environments where disaster/downtime avoidance is a key requirement. However, the maximum number of hosts in a stretch cluster configuration remains at 31, where Site 1 contains 15 hosts, Site 2 contains 15 hosts, and site 3 contains the witness host or virtual appliances.
For detailed guidance on designing a vSAN stretch cluster, consult the VMware Virtual SAN 6.1 Stretched Cluster & 2 Node Guide at https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/VMware-Virtual-SAN-6.1-Stretched-Cluster-Guide.pdf.