5.7 VMware Cloud Provider Program Compute Sizing Example
In this sizing example, the VMware Cloud Provider Program product owner has provided the following design requirements to the business stakeholders and architect based on commissioned market research.
The following table provides a summary of anticipated workload requirements for a new VMware Cloud Provider Program platform over three years. The numbers illustrate a sample design based on a mean calculation derived from small (35%), medium (30%), large (25%) and x-large (10%) virtual machines.
Table 3. Sample Design Scaling Requirements
Growth Metric | Values |
Anticipated Total Number of VMs in Year 1 | 5000 |
Anticipated Total Number of VMs in Year 2 (140% Growth) | 12,000 |
Anticipated Total Number of VMs in Year 3 (66.5% Growth) | 20,000 |
Figure 12. Sample Design Scaling
Table 4. Mean Virtual Machine Requirement Metrics
Performance Metric For T-Shirt Size Templates | Small (35% of 5000 VM Load) | Medium (30% of 5000 VM Load) | Large (25% of 5000 VM Load) | X-Large (10%) of 5000 VM Load) | Mean VM Resource |
Projected average number of vCPUs per VM | 1 | 2 | 4 | 8 | 2.75 vCPUs |
Projected average utilization per vCPU | 350 MHz | 350 MHz | 350 MHz | 350 MHz | 350 MHz |
Projected peak vCPU utilization | 600 MHz | 600 MHz | 600 MHz | 600 MHz | 600 MHz |
Projected average vRAM per VM | 2 GB | 4 GB | 8 GB | 16 GB | 7.5 GB |
Projected average memory utilization per VM | 60% (1.3 GB) | 60% (2.5 GB) | 60% (4.92 GB) | 60% (9.83 GB) | 60% (4.64 GB) |
Projected peak memory utilization per VM | 72% (1.5 GB) | 72% (2.95 GB) | 72% (5.9 GB) | 72% (11.8 GB) | 72% (5.5 GB) |
Assumed memory-sharing benefit when enabled (TPS) (*) | 10% | 10% | 10% | 10% | 10% |
Projected average network utilization per VM | 2.2 Mbps (3.8 Gbps) | 4.2 Mbps (6.2 Gbps) | 8.4 Mbps (10.25 Gbps) | 16.2 Mbps (7.91 Gbps) | 7.7 5Mbps (7.05 Gbps) |
Projected peak network utilization per VM | 6.0 Mbps (10.5 Gbps) | 7.0 Mbps (10.5 Gbps) | 12.0 Mbps (14.6 Gbps) | 32 Mbps (15.6 Gbps) | 9.20 Mbps (12.5Gbps) |
Projected average VM I/O requirement | 24 IOPS (42k) | 48 IOPS (72k) | 60 IOPS (75k) | 120 IOPS (60k) | 63 IOPS (62k) |
Projected peak VM I/O requirement | 48 IOPS (84k) | 60 IOPS (90k) | 100 IOPS (125k) | 200 IOPS (100k) | 102 IOPS (100k) |
*TPS now disabled by default.
Table 5. Year 1, 2, and 3 Scaling Requirements
Performance Metric | Year 1 Required Resources | Year 2 Required Resources | Year 3 Required Resources |
Total CPU resources for all virtual machines at peak | 3,000 GHz | 7,200 GHz | 12,000 GHz |
Projected Total RAM for all virtual machines at peak | 27,500 GB (26.9 TB) | 66,000 GB (64.5 TB) | 110,000 GB (107.4 TB) |
Total RAM for all virtual machines at peak (including TPS benefit memory sharing) | 24,750 GB (24.17 TB) | 59,400 GB (58 TB) | 99,000 GB (96.7 TB) |
Using this performance information provided by the cloud platform product manager, it is possible to derive the high-level CPU, memory, network bandwidth, and disk requirements that the platform must deliver to fulfill the design. The following table details the high-level specifications of the server hardware that is pertinent to this analysis as it has been selected by the service provider to deliver the compute resource to the tenant workload.
Table 6. Server Hardware Specification
Hardware Attribute | Specification |
Hardware vendor | Vendor X |
Form factor | 1U Rackmount |
Number of CPUs (sockets) per host | 2 |
Number of cores per CPU (Intel) | Intel Xeon Processor E5-2687W (20M cache, 3.10 GHz, 8.00 GT/s Intel QPI) 8 Core 16 Threads |
Hyperthreading | Enabled (16 logical cores per CPU) |
MHz per CPU core | 3.10 GHz |
Total CPU GHz per CPU | 24.8 GHz |
Total CPU GHz per host | 49.6 GHz |
Proposed maximum host CPU utilization | 80% |
Available CPU MHz per host | 39.68 GHz |
Total RAM per host | 512,000 GB |
Proposed maximum host RAM utilization | 80% |
Available RAM per host | 409.6 GB |
Number of Ethernet adaptor ports for network | 2 x 10 GB |
Installation destination | Boot from SAN (20 GB Boot LUN) |
ESXi server version | ESXi 6.0 server. Build 2494585 |
In determining the compute node required, the service provider has compared CPU requirements, memory requirements and the hardware cost to establish the “sweet spot” for the chosen server type. For instance, as with the example shown in the following table, while you might be able to meet the memory requirements with 61 hosts, the number of hosts required to meet the CPU requirements is higher at 76. Therefore, you would be required to implement 76 hosts to meet the workload requirement.
Alternatively, depending on other design factors, you might be able to look at modifying your CPU choice or look at hardware that could facilitate higher memory capacity to achieve the required balance. For instance, if the current calculations for CPU were based on 8 cores per socket, would modifying the choice in favor of 12-core alternatives balance the CPU/memory requirements and maintain availability calculations while reducing costs?
Also, remember to allow for growth and the vSphere HA admission control policy. While 76 hosts cover the memory and CPU requirement for year one, to meet the service provider’s availability SLA with its consumers, the final server count for the design is likely to be 85 nodes.
Table 7. Compute Hardware Requirements
Type | Available per Host | Number of compute nodes Year 1 | Number of compute nodes Year 2 | Number of compute nodes Year 3 |
CPU Memory | 39.68 GHz 409.6 GB | 76 + 9 for HA = 85 /24 Nodes (4x24 Node Clusters) | 182 + 23 for HA = 205 / 24 Nodes (9x24 Node Clusters) | 303 + 39 for HA = 342 / 24 Nodes (15x24 Node Clusters) |
Other factors that can affect host sizing include:
• When workloads must to be split into two different data centers or availability zones, the vSphere HA N+x requirements and also the growth factor are increased.
• Lower utilization targets on CPU and memory resources.
• Licensing costs, which might be affected with more CPU sockets.
• Virtual CPU-to-host CPU ratios. Having a higher vCPU-to-pCPU ratio might mean you cannot meet the consumers SLAs.
Another consideration that is critical when evaluating vendor hardware for the implementation of host resources is ongoing hardware maintenance. Upgrading and patching can be time consuming for operational teams and is simpler with fewer vendors, as is maintaining firmware and the BIOS on the servers and other associated components.
Also, keep in mind the hardware cost. Two smaller servers can often be procured for a lower cost than a comparable larger server. An example of this type of design decision is highlighted in the following flowchart.
Figure 13. Design Decision Example
In this design decision example of a vSphere cluster that is required to support HA, the admission control policy must be configured with its default setting of allowing a single host failure (N+1). Each of the two options in the flowchart meets the customer’s CPU and memory requirements for the workload that will utilize the clusters resources. However, as you can see, the two-node cluster might appear less expensive, although 50 percent of the total available resources are reserved by the admission control policy for failover capacity. While the four-node cluster option is more expensive, only 25 percent of the total available resources are reserved by admission control to provide appropriate failover capacity. Therefore, the likely design decision would be to scale out the solution with the four-node option to reduce the total reserved failover capacity and as such, lower the amount of resource that sits unused under normal operating conditions.
This example demonstrates a design decision that is based on sound, rational best practices. It is important that the architect involve the project stakeholders who have an understanding of the business goals in these types of design decisions, because these are the best people to help create a design that meets requirements and business goals. It is also important that all design decisions are documented and the rationale behind each decision is made clear to the project team.