Architecting a vSphere Compute Platform : Resource Balancing and Transparent Maintenance : 9.1 DRS Automation
   
9.1 DRS Automation
vSphere DRS employs vSphere vMotion to migrate virtual machines in either a fully automated, partially automated, or manual manner, depending on the parameters defined.
Figure 30. vSphere DRS Configuration Options
 
As shown in the preceding figure, there are three different automation levels. When Fully automated is selected, a migration threshold must be configured.
Table 20. DRS Automation Levels
Setting
Initial VM Placement
Load Balancing
Manual
Recommendation is displayed to administrator
DRS makes a recommendation but will not migrate VMs without validation from the administrator.
Partially Automatic
Automatic placement
While the initial placement is completed automatically by DRS, the migration of powered on virtual machines will only be performed after the administrator validates the recommendation in vCenter Server.
Automatic
Automatic placement
Fully automated migrates powered on virtual machines automatically. The level of aggressiveness employed by DRS for this automatic migration is based on a threshold corresponding to five different recommended levels from conservative (5 stars) to aggressive (1 star).
 
Table 21. Migration Threshold Options
Level
Stars
Description
Level 1
5
Conservative. Migrations will only take place if rules are not being respected or if a host is placed into maintenance mode.
Level 2
4
A migration will only take place if level 1 is met or if a migration will bring about significant improvements in performance.
Level 3
3
A migration will only take place if the first two levels are met or if a migration brings about a good amount of improvements to virtual machine performance.
Level 4
2
A migration will only take place if the first three levels are met or a migration brings about moderate improvements to virtual machine performance.
Level 5
1
Aggressive. Migration will occur only if all recommendations from Level 1 to 4 are met or if the migration will bring about minor improvements to virtual machine performance.
 
As you would expect, the conservative setting leads to fewer migrations whereas the aggressive configuration will lead to more frequent virtual machine migrations.
For a service provider environment, in which clusters are usually heterogeneous in nature, it is typically desirable to set the vSphere DRS automation level to Fully Automatic and the migration threshold to the default moderate (Level 3) configuration to avoid automatic vSphere vMotion actions that might have either only a minimal or short-term performance benefit.
Figure 31. Migration Threshold Slider – Recommendation for VMware Cloud Providers
 
By default, an automation level is specified for the whole cluster. However, you can also specify a custom automation level for individual virtual machines, if appropriate. For instance, you might do this if you require a specific vSphere DRS setting, such as manual, defined on explicit virtual machines within the cluster. Where this is configured, vCenter Server does not take automatic actions to balance those explicit resources but instead, the vSphere DRS Summary page indicates that migration recommendations are available, and the Migration page displays the recommendation.
Although it is not essential that you configure vSphere DRS on a service provider’s payload cluster, VMware recommends using this mechanism as a way of balancing workloads across hosts in the cluster for optimal performance.
Figure 32. vSphere DRS Automation Workflow
 
A service provider’s VMware Cloud Provider Program platform typically configure all DRS cluster parameters consistently across all clusters in all data centers. This helps to limit variability and simplify operational management. The following table provides design guidance for service providers by addressing the standard settings and options configured for vSphere DRS attributes.
Table 22. Sample vSphere DRS Settings
Attribute
Configuration
Cluster Name
boston-dc-01-payload-003
Number of ESXi Hosts
24
DRS
Enabled
Automation Level
Fully Automated
Migration Threshold
Moderate, Level 3 (Default)
vSphere Distributed Power Management (DPM)
N/A
Enhanced vSphere vMotion Compatibility
Disabled
 
For a service provider’s implementation, typically DRS is enabled to automate workload balancing. DRS will always benefit the overall infrastructure improving performance, scalability, and manageability while providing transparent maintenance to the consumers. The only exception to this is if the specific tenant, on a dedicated cluster, is running applications that scale and balance at the application level.
 
The following table examines use cases, business benefits, and design requirements for incorporating vSphere DRS as part of a VMware Cloud Provider Program platform.
Table 23. DRS Use Cases, Business Benefits, and Design Requirements
Use Cases
Business Benefits
Design Requirements
Redistribute CPU and/or memory load between ESXi hosts in the cluster.
Migrate virtual machines off an ESXi host when it is placed into maintenance mode.
Rules to keep virtual machines together on the same host (affinity rule) optimizing communication by ensuring host adjacency of VMs or separating virtual machines on to different ESXi hosts (anti-affinity) in order to maximize availability of services.
Use anti-affinity rules to increase availability for service workloads as appropriate, such as in rare cases where applications with high-transactional I/O workloads might require an anti-affinity rule to avoid an I/O bottleneck on the local host.
 
vSphere DRS collects resource usage information for all hosts and virtual machines in the cluster and will migrate virtual machines in one of two situations:
Initial placement – When you first power on a virtual machine in the cluster, DRS places that virtual machine on the most appropriate host.
Load balancing – DRS aims to improve resource utilization across the cluster by performing automatic migrations of running virtual machines (through vSphere vMotion).
Configuring DRS for full automation, using the default migration threshold:
Reduces daily monitoring and management requirements.
Provides sufficient balance without excessive migration activity.
 
vMotion migration requirements must be met by all hosts in the DRS cluster.
Whether or not to enable Enhanced vMotion Compatibility (EVC) at the appropriate EVC level on hosts.
DRS load balancing benefits from having a larger number of hosts in the cluster (scale-out cluster) rather than a smaller number of hosts.
DRS affinity and anti-affinity rules should be the exception rather than the norm. Configuring many affinity and anti-affinity rules limits migration choices and could collectively have a negative effect on workload balance. An affinity rule is typically beneficial in the following situations:
Virtual machines on the same network share significant network traffic where the affinity rule localizes traffic within the host’s virtual switch, which reduces traffic on the physical network components.
Applications can share a large memory working set size where Transparent Page Sharing (TPS) can reduce the actual amount of memory used.