Architecting a vSphere Compute Platform : Scalability and Designing Physical Resources : 5.7 VMware Cloud Provider Program Compute Sizing Example
   
5.7 VMware Cloud Provider Program Compute Sizing Example
In this sizing example, the VMware Cloud Provider Program product owner has provided the following design requirements to the business stakeholders and architect based on commissioned market research.
The following table provides a summary of anticipated workload requirements for a new VMware Cloud Provider Program platform over three years. The numbers illustrate a sample design based on a mean calculation derived from small (35%), medium (30%), large (25%) and x-large (10%) virtual machines.
Table 3. Sample Design Scaling Requirements
Growth Metric
Values
Anticipated Total Number of VMs in Year 1
5000
Anticipated Total Number of VMs in Year 2 (140% Growth)
12,000
Anticipated Total Number of VMs in Year 3 (66.5% Growth)
20,000
 
Figure 12. Sample Design Scaling
 
 
Table 4. Mean Virtual Machine Requirement Metrics
Performance Metric For T-Shirt Size Templates
Small
(35% of 5000 VM Load)
Medium
(30% of 5000 VM Load)
Large
(25% of 5000 VM Load)
X-Large
(10%) of 5000 VM Load)
Mean VM Resource
Projected average number of vCPUs per VM
1
2
4
8
2.75 vCPUs
Projected average utilization per vCPU
350 MHz
350 MHz
350 MHz
350 MHz
350 MHz
Projected peak vCPU utilization
600 MHz
600 MHz
600 MHz
600 MHz
600 MHz
Projected average vRAM per VM
2 GB
4 GB
8 GB
16 GB
7.5 GB
Projected average memory utilization per VM
60%
(1.3 GB)
60%
(2.5 GB)
60%
(4.92 GB)
60%
(9.83 GB)
60%
(4.64 GB)
Projected peak memory utilization per VM
72%
(1.5 GB)
72%
(2.95 GB)
72%
(5.9 GB)
72%
(11.8 GB)
72%
(5.5 GB)
Assumed memory-sharing benefit when enabled (TPS) (*)
10%
10%
10%
10%
10%
Projected average network utilization per VM
2.2 Mbps
(3.8 Gbps)
4.2 Mbps
(6.2 Gbps)
8.4 Mbps (10.25 Gbps)
16.2 Mbps (7.91 Gbps)
7.7 5Mbps (7.05 Gbps)
Projected peak network utilization per VM
6.0 Mbps
(10.5 Gbps)
7.0 Mbps
(10.5 Gbps)
12.0 Mbps (14.6 Gbps)
32 Mbps
(15.6 Gbps)
9.20 Mbps (12.5Gbps)
Projected average VM I/O requirement
24 IOPS (42k)
48 IOPS (72k)
60 IOPS (75k)
120 IOPS (60k)
63 IOPS (62k)
Projected peak VM I/O requirement
48 IOPS (84k)
60 IOPS (90k)
100 IOPS (125k)
200 IOPS (100k)
102 IOPS (100k)
*TPS now disabled by default.
 
Table 5. Year 1, 2, and 3 Scaling Requirements
Performance Metric
Year 1 Required Resources
Year 2 Required Resources
Year 3 Required Resources
Total CPU resources for all virtual machines at peak
3,000 GHz
7,200 GHz
12,000 GHz
Projected Total RAM for all virtual machines at peak
27,500 GB
(26.9 TB)
66,000 GB
(64.5 TB)
110,000 GB
(107.4 TB)
Total RAM for all virtual machines at peak (including TPS benefit memory sharing)
24,750 GB
(24.17 TB)
59,400 GB
(58 TB)
99,000 GB
(96.7 TB)
 
Using this performance information provided by the cloud platform product manager, it is possible to derive the high-level CPU, memory, network bandwidth, and disk requirements that the platform must deliver to fulfill the design. The following table details the high-level specifications of the server hardware that is pertinent to this analysis as it has been selected by the service provider to deliver the compute resource to the tenant workload.
Table 6. Server Hardware Specification
Hardware Attribute
Specification
Hardware vendor
Vendor X
Form factor
1U Rackmount
Number of CPUs (sockets) per host
2
Number of cores per CPU (Intel)
Intel Xeon Processor E5-2687W
(20M cache, 3.10 GHz, 8.00 GT/s Intel QPI)
8 Core
16 Threads
Hyperthreading
Enabled (16 logical cores per CPU)
MHz per CPU core
3.10 GHz
Total CPU GHz per CPU
24.8 GHz
Total CPU GHz per host
49.6 GHz
Proposed maximum host CPU utilization
80%
Available CPU MHz per host
39.68 GHz
Total RAM per host
512,000 GB
Proposed maximum host RAM utilization
80%
Available RAM per host
409.6 GB
Number of Ethernet adaptor ports for network
2 x 10 GB
Installation destination
Boot from SAN (20 GB Boot LUN)
ESXi server version
ESXi 6.0 server. Build 2494585
 
In determining the compute node required, the service provider has compared CPU requirements, memory requirements and the hardware cost to establish the “sweet spot” for the chosen server type. For instance, as with the example shown in the following table, while you might be able to meet the memory requirements with 61 hosts, the number of hosts required to meet the CPU requirements is higher at 76. Therefore, you would be required to implement 76 hosts to meet the workload requirement.
Alternatively, depending on other design factors, you might be able to look at modifying your CPU choice or look at hardware that could facilitate higher memory capacity to achieve the required balance. For instance, if the current calculations for CPU were based on 8 cores per socket, would modifying the choice in favor of 12-core alternatives balance the CPU/memory requirements and maintain availability calculations while reducing costs?
Also, remember to allow for growth and the vSphere HA admission control policy. While 76 hosts cover the memory and CPU requirement for year one, to meet the service provider’s availability SLA with its consumers, the final server count for the design is likely to be 85 nodes.
Table 7. Compute Hardware Requirements
Type
Available per Host
Number of compute nodes Year 1
Number of compute nodes Year 2
Number of compute nodes Year 3
CPU
Memory
39.68 GHz
409.6 GB
76 + 9 for HA =
85 /24 Nodes (4x24 Node Clusters)
182 + 23 for HA =
205 / 24 Nodes (9x24 Node Clusters)
303 + 39 for HA =
342 / 24 Nodes (15x24 Node Clusters)
 
 
Other factors that can affect host sizing include:
When workloads must to be split into two different data centers or availability zones, the vSphere HA N+x requirements and also the growth factor are increased.
Lower utilization targets on CPU and memory resources.
Licensing costs, which might be affected with more CPU sockets.
Virtual CPU-to-host CPU ratios. Having a higher vCPU-to-pCPU ratio might mean you cannot meet the consumers SLAs.
Another consideration that is critical when evaluating vendor hardware for the implementation of host resources is ongoing hardware maintenance. Upgrading and patching can be time consuming for operational teams and is simpler with fewer vendors, as is maintaining firmware and the BIOS on the servers and other associated components.
Also, keep in mind the hardware cost. Two smaller servers can often be procured for a lower cost than a comparable larger server. An example of this type of design decision is highlighted in the following flowchart.
Figure 13. Design Decision Example
 
In this design decision example of a vSphere cluster that is required to support HA, the admission control policy must be configured with its default setting of allowing a single host failure (N+1). Each of the two options in the flowchart meets the customer’s CPU and memory requirements for the workload that will utilize the clusters resources. However, as you can see, the two-node cluster might appear less expensive, although 50 percent of the total available resources are reserved by the admission control policy for failover capacity. While the four-node cluster option is more expensive, only 25 percent of the total available resources are reserved by admission control to provide appropriate failover capacity. Therefore, the likely design decision would be to scale out the solution with the four-node option to reduce the total reserved failover capacity and as such, lower the amount of resource that sits unused under normal operating conditions.
This example demonstrates a design decision that is based on sound, rational best practices. It is important that the architect involve the project stakeholders who have an understanding of the business goals in these types of design decisions, because these are the best people to help create a design that meets requirements and business goals. It is also important that all design decisions are documented and the rationale behind each decision is made clear to the project team.