11.2.1 General Considerations
Verify that there are a minimum of three NSX Controller nodes deployed in a cluster. NSX for vSphere 6.1.x supports only clusters with three nodes. NSX Controller nodes are deployed as virtual appliances from the NSX Manager user interface. Each appliance is characterized by an IP address used for all control plane interactions with configuration of 4 vCPUs and 4 GB of RAM and currently cannot be modified.
VMware recommends spreading the deployment of the cluster nodes across separate ESXi hosts for increased reliability so that the failure of a single ESXi host does not cause the loss of majority number in the cluster. VMware NSX does not currently provide any embedded capability to enforce this, so the recommendation is to leverage the native vSphere anti-affinity rules to avoid deploying more than one controller node on the same ESXi host. For more information on how to create a VM-to-VM anti-affinity rule, see the “Create a VM-VM Affinity Rule in the vSphere Web Client” section of the VMware vSphere Resource Management Guide.
Verify that all NSX Controller nodes display a Connected status. If any of the Controller nodes displays a Disconnected status, run the show control-cluster status command on all NSX Controller nodes to verify a consistent state.
Table 9. NSX Controller Status
Type | Status |
Join status | Join complete |
Majority status | Connected to cluster majority |
Cluster ID | Same information on all Controller nodes |
In addition, check that all roles are consistent on all NSX Controller nodes.
Table 10. NSX Controller Node Role Status
Role | Configured Status | Active Status |
api_provider | enabled | activated |
persistence_server | enabled | activated |
switch_manager | enabled | activated |
logical_manager | enabled | activated |
directory_server | enabled | activated |
• Verify that vnet-controller process is running. Run the show process command on all Controller nodes and check that the java-dir-server service is running.
• Verify the system status and resource utilization for each NSX Controller. Run the show status command ensuring load is optimal for all nodes.
• Verify the cluster history and check that there is no sign of host connection flapping, VNI join failures, or abnormal cluster membership changes. Run the show control-cluster history command.
• Verify that SSL is enabled on the NSX Controller cluster. Run the show log cloudnet/cloudnet_java-vnet-controller*.log filtered-by sslEnabled command on each of the NSX Controller nodes.
• Check for host connectivity errors. Run the show log cloudnet/cloudnet_java-vnet-controller*.log filtered-by host_IP command on each of the NSX Controller nodes.
• Check for any abnormal error statistics. Run the following commands on each of the NSX Controller nodes:
o show control-cluster core stats: overall stats
o show control-cluster core stats-sample: latest stats samples
o show control-cluster core connection-stats <ip>: per connection stats
• Verify logical switch/router message statistics or high message rate. Run these commands on each of the NSX Controller nodes:
o show control-cluster logical-switches stats
o show control-cluster logical-routers stats
o show control-cluster logical-switches stats-sample
o show control-cluster logical-routers stats-sample
o show control-cluster logical-switches vni-stats <vni>
o show control-cluster logical-switches vni-stats-sample <vni>
o show control-cluster logical-switches connection-stats <ip>
o show control-cluster logical-routers connection-stats <ip>
For more information, see the
VMware NSX Command Line Interface Reference Guide at
http://pubs.vmware.com/NSX-6/topic/com.vmware.ICbase/PDF/nsx_60_cli.pdf.• Verify that your environment is not experiencing any high storage latencies. Zookeeper logs these messages when storage latencies are greater than one second. (See the symptoms section when running the command show log cloudnet/cloudnet_java-zookeeper*.log filtered-by fsync .) However, for the VMware NSX for vSphere 6.x control cluster, these are only a concern if the latencies are greater than 10 seconds. VMware recommends dedicating a LUN specifically for the control-cluster and/or moving the storage array closer to the controller cluster in terms of latencies.