5.7 VMware Cloud Provider Program Compute Sizing Example

The following table provides a summary of anticipated workload requirements for a new VMware Cloud Provider Program platform over three years. The numbers illustrate a sample design based on a mean calculation derived from small (35%), medium (30%), large (25%) and x-large (10%) virtual machines.

Growth Metric	Values
Anticipated Total Number of VMs in Year 1	5000
Anticipated Total Number of VMs in Year 2 (140% Growth)	12,000
Anticipated Total Number of VMs in Year 3 (66.5% Growth)	20,000

Performance Metric For T-Shirt Size Templates	Small (35% of 5000 VM Load)	Medium (30% of 5000 VM Load)	Large (25% of 5000 VM Load)	X-Large (10%) of 5000 VM Load)	Mean VM Resource
Projected average number of vCPUs per VM	1	2	4	8	2.75 vCPUs
Projected average utilization per vCPU	350 MHz	350 MHz	350 MHz	350 MHz	350 MHz
Projected peak vCPU utilization	600 MHz	600 MHz	600 MHz	600 MHz	600 MHz
Projected average vRAM per VM	2 GB	4 GB	8 GB	16 GB	7.5 GB
Projected average memory utilization per VM	60% (1.3 GB)	60% (2.5 GB)	60% (4.92 GB)	60% (9.83 GB)	60% (4.64 GB)
Projected peak memory utilization per VM	72% (1.5 GB)	72% (2.95 GB)	72% (5.9 GB)	72% (11.8 GB)	72% (5.5 GB)
Assumed memory-sharing benefit when enabled (TPS) (*)	10%	10%	10%	10%	10%
Projected average network utilization per VM	2.2 Mbps (3.8 Gbps)	4.2 Mbps (6.2 Gbps)	8.4 Mbps (10.25 Gbps)	16.2 Mbps (7.91 Gbps)	7.7 5Mbps (7.05 Gbps)
Projected peak network utilization per VM	6.0 Mbps (10.5 Gbps)	7.0 Mbps (10.5 Gbps)	12.0 Mbps (14.6 Gbps)	32 Mbps (15.6 Gbps)	9.20 Mbps (12.5Gbps)
Projected average VM I/O requirement	24 IOPS (42k)	48 IOPS (72k)	60 IOPS (75k)	120 IOPS (60k)	63 IOPS (62k)
Projected peak VM I/O requirement	48 IOPS (84k)	60 IOPS (90k)	100 IOPS (125k)	200 IOPS (100k)	102 IOPS (100k)

Performance Metric	Year 1 Required Resources	Year 2 Required Resources	Year 3 Required Resources
Total CPU resources for all virtual machines at peak	3,000 GHz	7,200 GHz	12,000 GHz
Projected Total RAM for all virtual machines at peak	27,500 GB (26.9 TB)	66,000 GB (64.5 TB)	110,000 GB (107.4 TB)
Total RAM for all virtual machines at peak (including TPS benefit memory sharing)	24,750 GB (24.17 TB)	59,400 GB (58 TB)	99,000 GB (96.7 TB)

Using this performance information provided by the cloud platform product manager, it is possible to derive the high-level CPU, memory, network bandwidth, and disk requirements that the platform must deliver to fulfill the design. The following table details the high-level specifications of the server hardware that is pertinent to this analysis as it has been selected by the service provider to deliver the compute resource to the tenant workload.

Hardware Attribute	Specification
Hardware vendor	Vendor X
Form factor	1U Rackmount
Number of CPUs (sockets) per host	2
Number of cores per CPU (Intel)	Intel Xeon Processor E5-2687W (20M cache, 3.10 GHz, 8.00 GT/s Intel QPI) 8 Core 16 Threads
Hyperthreading	Enabled (16 logical cores per CPU)
MHz per CPU core	3.10 GHz
Total CPU GHz per CPU	24.8 GHz
Total CPU GHz per host	49.6 GHz
Proposed maximum host CPU utilization	80%
Available CPU MHz per host	39.68 GHz
Total RAM per host	512,000 GB
Proposed maximum host RAM utilization	80%
Available RAM per host	409.6 GB
Number of Ethernet adaptor ports for network	2 x 10 GB
Installation destination	Boot from SAN (20 GB Boot LUN)
ESXi server version	ESXi 6.0 server. Build 2494585

In determining the compute node required, the service provider has compared CPU requirements, memory requirements and the hardware cost to establish the “sweet spot” for the chosen server type. For instance, as with the example shown in the following table, while you might be able to meet the memory requirements with 61 hosts, the number of hosts required to meet the CPU requirements is higher at 76. Therefore, you would be required to implement 76 hosts to meet the workload requirement.

Alternatively, depending on other design factors, you might be able to look at modifying your CPU choice or look at hardware that could facilitate higher memory capacity to achieve the required balance. For instance, if the current calculations for CPU were based on 8 cores per socket, would modifying the choice in favor of 12-core alternatives balance the CPU/memory requirements and maintain availability calculations while reducing costs?

Also, remember to allow for growth and the vSphere HA admission control policy. While 76 hosts cover the memory and CPU requirement for year one, to meet the service provider’s availability SLA with its consumers, the final server count for the design is likely to be 85 nodes.

Type	Available per Host	Number of compute nodes Year 1	Number of compute nodes Year 2	Number of compute nodes Year 3
CPU Memory	39.68 GHz 409.6 GB	76 + 9 for HA = 85 /24 Nodes (4x24 Node Clusters)	182 + 23 for HA = 205 / 24 Nodes (9x24 Node Clusters)	303 + 39 for HA = 342 / 24 Nodes (15x24 Node Clusters)

Another consideration that is critical when evaluating vendor hardware for the implementation of host resources is ongoing hardware maintenance. Upgrading and patching can be time consuming for operational teams and is simpler with fewer vendors, as is maintaining firmware and the BIOS on the servers and other associated components.

In this design decision example of a vSphere cluster that is required to support HA, the admission control policy must be configured with its default setting of allowing a single host failure (N+1). Each of the two options in the flowchart meets the customer’s CPU and memory requirements for the workload that will utilize the clusters resources. However, as you can see, the two-node cluster might appear less expensive, although 50 percent of the total available resources are reserved by the admission control policy for failover capacity. While the four-node cluster option is more expensive, only 25 percent of the total available resources are reserved by admission control to provide appropriate failover capacity. Therefore, the likely design decision would be to scale out the solution with the four-node option to reduce the total reserved failover capacity and as such, lower the amount of resource that sits unused under normal operating conditions.

This example demonstrates a design decision that is based on sound, rational best practices. It is important that the architect involve the project stakeholders who have an understanding of the business goals in these types of design decisions, because these are the best people to help create a design that meets requirements and business goals. It is also important that all design decisions are documented and the rationale behind each decision is made clear to the project team.