Architecting a vSphere Compute Platform : Scalability and Designing Physical Resources : 5.9 Performance Tuning with NUMA
   
5.9 Performance Tuning with NUMA
When sizing virtual machines for a service catalog, keep non-uniform memory access (NUMA) recommendations in mind to optimize platform performance.
Most modern servers have CPUs with directly attached memory. The CPU scheduler is aware of this physical architecture when it is available in the hardware, and targets processes to run on a CPU with fast access to the local memory as shown in the following figure. Because the process of accessing memory on a different CPU is not as efficient, the scheduler tries to keep processes on the same physical CPU to take advantage of the CPU cache and local memory (unless you manually override this functionality). Some NUMA systems provide the ability in the BIOS to disable NUMA by enabling node interleaving. Typically, you will get optimum performance by disabling node interleaving (that is, leaving NUMA enabled).
VMware recommends not assigning more vCPUs to a virtual machine than a physical CPU has cores. Consider this recommendation when designing your virtual machine service catalogs and virtual machine “t-shirt” sizing. Employing this recommendation as part of your service design means that the scheduler will not split vCPUs across multiple CPUs logical cores.
Likewise, in the service design, do not assign more memory to a virtual machine than is available to a single NUMA node. If necessary, check the server configuration to see how much memory each CPU can directly access. When under CPU contention, the scheduler might move vCPUs to other NUMA nodes, which will have a temporary performance impact.
Figure 15. NUMA Architecture
 
As the architect, examine the service design for high performance, oversized virtual machines and make the recommendation to stakeholders to stay within the size of a single physical NUMA node for vRAM and vCPU. This way local memory with the highest speed access is employed, with CPU and memory within the virtual machine being maintained from a single NUMA node, where possible.