8.1.6 Heartbeat Network Path Redundancy

Redundancy between cluster nodes provides the best reliability for the mechanism that protects the virtual machines. A single host management network is potentially a single point of failure and might result in failover scenarios, even though only a single network component has failed. Without heartbeat path redundancy, any failure between the host and the cluster could potentially cause an unnecessary failover event.

Hardware failures, including network interface card failures, network cable failures, network cable removal, and switch resets, can be possible sources of failure between hosts, and therefore any redundant design mitigates against this risk, and try to minimize the impact of failure. Typically, this can be achieved by providing network redundancy at every component of the physical network.

It is possible to implement network redundancy either at the NIC level with NIC teaming, or at the management network level with a secondary management network. In most service provider implementations, NIC teaming will provide sufficient redundancy, but you can use or add a secondary management network for redundancy if required. Implementing a redundant management networking allows the reliable detection of host failures and prevents isolation event conditions from occurring because heartbeat traffic can be sent over multiple networks.

Aim to configure as few as possible hardware segments between the servers in a cluster. The goal is to limit single points of failure, which is best achieved through simplicity. In addition, too many network hops can cause networking packet delays for heartbeat traffic and increase the possible points of failure.

Option 1: Heartbeat Network Path Redundancy (NIC Teaming)

You can create a network interface card (NIC) team for vSphere HA management network redundancy, which is typically the recommended configuration for service providers. Each NIC in the team must be connected to a separate physical switch.

In this design, the NIC team helps prevent a switch failure from initiating a vSphere HA isolation response. See the following figure for an example of NIC teaming.

Option 2: Heartbeat Network Path Redundancy (Secondary Management Network)

This second option creates a second VMkernel port for ESXi, which is attached to a separate virtual switch or port group. In this design, all management network interfaces are employed to send heartbeats.

As shown in the following figure, virtual switches are configured on separate physical switches, eliminating all single points of failure. For this design, you will also need to use the das.isolationaddress parameter to add an isolation address to each additional management network segment, which removes the isolation address as a single point of failure.