Architecting a VMware NSX Solution : Operational Considerations : 11.2 NSX Controller Operational Considerations : 11.2.1 General Considerations
   
11.2.1 General Considerations
Verify that there are a minimum of three NSX Controller nodes deployed in a cluster. NSX for vSphere 6.1.x supports only clusters with three nodes. NSX Controller nodes are deployed as virtual appliances from the NSX Manager user interface. Each appliance is characterized by an IP address used for all control plane interactions with configuration of 4 vCPUs and 4 GB of RAM and currently cannot be modified.
VMware recommends spreading the deployment of the cluster nodes across separate ESXi hosts for increased reliability so that the failure of a single ESXi host does not cause the loss of majority number in the cluster. VMware NSX does not currently provide any embedded capability to enforce this, so the recommendation is to leverage the native vSphere anti-affinity rules to avoid deploying more than one controller node on the same ESXi host. For more information on how to create a VM-to-VM anti-affinity rule, see the “Create a VM-VM Affinity Rule in the vSphere Web Client” section of the VMware vSphere Resource Management Guide.
Verify that all NSX Controller nodes display a Connected status. If any of the Controller nodes displays a Disconnected status, run the show control-cluster status command on all NSX Controller nodes to verify a consistent state.
Table 9. NSX Controller Status
Type
Status
Join status
Join complete
Majority status
Connected to cluster majority
Cluster ID
Same information on all Controller nodes
 
In addition, check that all roles are consistent on all NSX Controller nodes.
Table 10. NSX Controller Node Role Status
Role
Configured Status
Active Status
api_provider
enabled
activated
persistence_server
enabled
activated
switch_manager
enabled
activated
logical_manager
enabled
activated
directory_server
enabled
activated
Verify that vnet-controller process is running. Run the show process command on all Controller nodes and check that the java-dir-server service is running.
Verify the system status and resource utilization for each NSX Controller. Run the show status command ensuring load is optimal for all nodes.
Verify the cluster history and check that there is no sign of host connection flapping, VNI join failures, or abnormal cluster membership changes. Run the show control-cluster history command.
Verify that VXLAN Network Identifier (VNI) is configured. For more information, see the “VXLAN Preparation Steps” section of the VMware VXLAN Deployment Guide at https://www.vmware.com/resources/techresources/10356.
Verify that SSL is enabled on the NSX Controller cluster. Run the show log cloudnet/cloudnet_java-vnet-controller*.log filtered-by sslEnabled command on each of the NSX Controller nodes.
Check for host connectivity errors. Run the show log cloudnet/cloudnet_java-vnet-controller*.log filtered-by host_IP command on each of the NSX Controller nodes.
Check for any abnormal error statistics. Run the following commands on each of the NSX Controller nodes:
o show control-cluster core stats: overall stats
o show control-cluster core stats-sample: latest stats samples
o show control-cluster core connection-stats <ip>: per connection stats
Verify logical switch/router message statistics or high message rate. Run these commands on each of the NSX Controller nodes:
o show control-cluster logical-switches stats
o show control-cluster logical-routers stats
o show control-cluster logical-switches stats-sample
o show control-cluster logical-routers stats-sample
o show control-cluster logical-switches vni-stats <vni>
o show control-cluster logical-switches vni-stats-sample <vni>
o show control-cluster logical-switches connection-stats <ip>
o show control-cluster logical-routers connection-stats <ip>
For more information, see the VMware NSX Command Line Interface Reference Guide at http://pubs.vmware.com/NSX-6/topic/com.vmware.ICbase/PDF/nsx_60_cli.pdf.
Verify that your environment is not experiencing any high storage latencies. Zookeeper logs these messages when storage latencies are greater than one second. (See the symptoms section when running the command show log cloudnet/cloudnet_java-zookeeper*.log filtered-by fsync .) However, for the VMware NSX for vSphere 6.x control cluster, these are only a concern if the latencies are greater than 10 seconds. VMware recommends dedicating a LUN specifically for the control-cluster and/or moving the storage array closer to the controller cluster in terms of latencies.