Architecting VMware vSAN 6.2 : vSAN Design Overview : 5.6 vSAN Requirements : 5.6.2 vSAN Cluster and Datastore Design : 5.6.2.2 Disk Group Design
   
5.6.2.2 Disk Group Design
Disk groups can be thought of as storage “containers” on vSAN hosts. They contain a maximum of one flash cache device and up to seven capacity devices. Either mechanical disks (hybrid configuration) or flash devices (all-flash configuration) are used as capacity devices. Each disk group assigns a cache device to provide the cache for a given capacity device. The recommendation is to have at least a 10 percent cache-to-capacity ratio. This provides a degree of control over performance as the cache-to-capacity ratio is based on disk group configuration. This also needs to be taken into account when planning future growth. For instance, you want to make sure that the flash layer devices are large enough to scale the capacity layer for growth. Otherwise, you will not be able to maintain the minimum flash-to-capacity ratio. Depending on the use case, it might be necessary to design with additional cache up front to allow for future growth of the capacity layer.
To rebuild components after a failure, the design must be sized so that there is a free host’s worth of capacity to tolerate each failure. There must be at least one full host’s worth of capacity free for maintenance. The number of failures to tolerate determines whether there is a requirement for additional host capacity. For example, to rebuild components after one failure (FTT=1), there must be one full host’s worth of capacity available. To rebuild components after a second failure (FTT=2), there must be two full hosts’ worth of capacity free.
When evaluating hardware for cluster nodes in a hybrid cloud environment, the hardware must be identical, with special attention given to the storage I/O controllers. Queue depth must be as large as possible. At a minimum, the queue depth must be able to accommodate the throughput of current and future devices. In general, SATA drives have the lowest queue depth of the supported mechanical disks, and for this reason, they are not recommended in a cloud environment. Equally important, verify that the storage I/O controller supports pass-through mode. RAID 0 is not recommended in a hybrid cloud environment due to the increased maintenance of setting up and replacing disks.
Key design decisions must be made about number of disk groups and the flash-to-mechanical disk or flash-to-flash ratio in vSAN. Consider that vSAN:
Supports up to one flash device for cache and a maximum of seven mechanical disks for capacity per disk group in a hybrid configuration.
Supports up to one flash device for cache and a maximum of seven flash devices for capacity per disk group in an all-flash configuration.
Supports up to a maximum of five disk groups per host.
The number of mechanical disks matters in hybrid configurations due to the eventual destaging of read cache. Multiple disk spindles can speed up this process. Having more, smaller mechanical disks often provides better performance than fewer, larger disks in hybrid configurations.
Allow 30 percent slack space when designing capacity.
vSAN begins automatic rebalancing when a disk reaches the 80 percent of full threshold.
Target configurations must be approximately 10 percent of the 80 percent threshold.
Multiple disk groups typically provide better performance and smaller fault domains, but might sometimes come at a cost and consume additional disk slots.
The more disks that are configured per disk group, the more cache is needed, and the more capacity that is available for virtual machines. However, this leads to additional costs due to the disk group limits. Multiple disk groups require one flash device per group for cache and at least one device for capacity.
Disk group sizing is also important to consider when designing the volume. Include the following data points when deciding on the number of disk groups per host:
Available space on the vSAN datastore.
Number of failures you want to tolerate in the cluster.
The optimal number of disk groups is a balance between hardware and space requirements for the vSAN datastore. More disk groups increase space and provide higher availability. However, adding disk groups can be cost-prohibitive.
Thus, the total amount of space in the configuration can be quite significant, based on the number of disks configured in the hosts. Configure the disk groups based on:
The end size of the datastore.
The tolerance for failure required for the design (both failures-to-tolerate and fault domains).
A good starting point is to utilize two disk groups per host, containing three HDDs. This means that each host contains two SSDs and six HDDs. However, you can use more or fewer disks to meet the sizing requirements for vSAN (and the estimated sizing required for the overall design).