Architecting a vSphere Compute Platform : Planning for Server Failure : 8.1 vSphere High Availability : 8.1.8 Heartbeat Datastores
   
8.1.8 Heartbeat Datastores
When the master host in a vSphere HA cluster is unable to communicate with a slave host over the ESXi management network, the master host resorts to using datastore heartbeats to establish whether the slave host has become unavailable, is in a partition state, or is network isolated. Where the slave host has stopped the datastore heartbeats, it is considered to have failed, and its virtual machines are restarted on other surviving hosts.
By default, vCenter Server selects a preferred set of datastores for heartbeats to maximize the number of hosts that have access to a specific heartbeat datastore and therefore minimize the likelihood that the datastores are backed by the same storage array. However, it is a simple task to replace the default-selected datastores by using the Cluster Settings dialog box in the vSphere Web Client to specify specific heartbeat datastores.
It is also possible to use the advanced attribute das.heartbeatdsperhost to change the number of heartbeat datastores selected by vCenter Server for each host in the cluster. The default is two and the maximum valid value is five.
When enabled, vSphere HA creates a directory at the root of each of the selected datastores. The name of the directory is .vSphere-HA. Make sure that this directory is never deleted or modified by your operational teams, because this action will almost certainly have a serious impact on the vSphere HA mechanism to maintain operations.
In a vSAN environment, vSphere HA behavior is slightly different from the traditional mechanism. Datastore heartbeats are no longer relevant, and the vSphere HA agent uses the vSAN network to communicate instead of the host management network. However, the management gateway remains used by the host to detect whether it has become isolated.
Datastore heartbeat key design implications include:
Allowing vCenter Server to select a preferred set of heartbeat datastores. vCenter Server uses certain guidelines for choosing the preferred set of heartbeat datastores:
o Choose a datastore that is accessible by the maximum number of hosts.
o Prefer VMFS datastores to NFS datastores.
o Prefer datastores that are backed by different storage arrays.
vSphere HA uses approximately 3 MB of disk space on each heartbeat datastore, which is negligible for most environments.
The vSphere HA datastore heartbeat mechanism adds a negligible overhead on the storage system that has no performance effect on other storage operations.
The following points are considered best practices for a service provider when designing a solution that employs vSphere HA:
Always configure strict admission control to protect tenant’s workload. While this reserves resources that cannot be used under normal operating conditions, and therefore increases hardware costs, it protects critical business services to tenants. Always explain the risks of not enabling a strict admission control policy to the key stakeholders and SMEs.
The size of the cluster and percentage of reserved capacity are closely interrelated.
Verify that the amount of resources reserved for failover is not proportionally too high, and that it does not negatively affect the resources available to tenants.
Always reserve sufficient failover capacity to accommodate host failures during scheduled maintenance and unplanned downtime.
Make strict admission control a matter of a change control policy, ensuring that powering on unprotected virtual machines becomes a choice made by managers and not operational staff or administrators.