7. vCloud Service Control : 7.1 vCloud Service Governance and Lifecycle Management : 7.1.2 Service Level Management
7.1.2 Service Level Management
Service Level Management defines the Service Level Agreement (SLA) associated with a vCloud service offering or a tier of service, negotiates corresponding Operating Level Agreements with the service provider to support the SLAs, and regularly monitors service levels and reports on results. Definition of Service Level Agreement
A service level agreement (SLA) is a predetermined agreement between the service consumer and the service provider that measures the quality and performance of the available services. SLAs can be of many types, from those that measure pure service availability to those that measure response time for service components and process workflows as experienced by users.
Services run at every layer of the vCloud stack, so service consumers might be business users or internal IT groups who access the vCloud primarily for technology and infrastructure services. SLAs for base technology services that are not consumed directly by business users but are needed to make sure that downstream operations and infrastructure components support the business users’ SLAs, are referred to as Operational Level Agreements (OLAs). vCloud Layers and SLAs
A typical vCloud computing environment consists of multiple layers (IaaS, PaaS, SaaS, and possibly others). The customer chooses how to implement the vCloud stack based on business requirements. Options include creating a private vCloud, using a public vCloud provider, or creating a hybrid vCloud model in which both private and public vCloud resources are used. The enabler for this flexibility is the ability of an organization to guarantee availability and performance at every vCloud layer. This is achieved by signing SLAs externally with service providers, or for a private vCloud, creating SLAs with internal user organizations and supporting OLAs with the IT organization. Example
The following figure shows an example use case for an organization with an IaaS layer hosted by a public vCloud provider and the PaaS and SaaS layers maintained internally.
Figure 12. Example Organization with Public vCloud IaaS and Private vCloud PaaS/SaaS Layers
The SLAs shown are for illustration purposes only and are a subset of the total number of SLAs created within an organization in such a case.
The example includes the following SLAs:
*IaaS layer:
*Uptime/availability SLA signed with the external vCloud service provider.
*Network performance SLA signed with the external service provider.
*Request fulfillment SLA – Measure of response time for provisioning and access configuration requests.
*Restore time SLA.
*PaaS layer:
*Uptime/availability SLA for development environment.
*Uptime/availability SLA for critical development environment components.
*Restore time SLA for development environment.
*SaaS Layer:
*Uptime/availability SLA specific to an application.
*Application response time SLA – Measure of how the application is performing for the business users.
*Time to resolution SLA – Time to recover an application in case of a failure.
Given this example, the following are some key conclusions:
*SLAs, OLAs, and KPIs are relevant at all levels within a vCloud stack. These agreements are required to provide efficiency and accountability at every layer, for both external providers and internal IT groups.
*These SLAs, OLAs, and KPIs need to be managed within every layer to help isolate systemic problems and eliminate delays.
*SLAs can be between external vendors or providers of vCloud services, or between internal IT groups. An organization can choose whether to implement a private, public, or hybrid vCloud. At every layer, SLAs give organizations flexibility by guaranteeing availability and quality of service.
*There are interrelationships between SLAs set up at different vCloud layers. A change in quality of service or breach of an SLA at a lower vCloud layer may impact multiple SLAs in a higher vCloud layer. In the example, if there is a breach of a performance SLA that results in the external vCloud provider’s inability to support OS performance needs, the breach has a ripple effect at the SaaS layer, decreasing application performance and response time for business users.
*SLAs need to be continuously managed and evaluated to maintain quality of service in a vCloud. Business needs are continuously evolving, resulting in changing vCloud business requirements. SLAs must be continuously updated to reflect current business requirements.
Consider the impact of adding another 1000 users to a particular application, so the application becomes mission critical. SLAs supporting the application might need to be updated to provide increased uptime and availability. This might lead to increased demands at the IaaS layer, so SLAs with the external IaaS provider might also have to be expanded. vCloud SLA Considerations
vCloud SLA considerations include the following:
*Uptime/availability SLA:
*Business hours – To what timeframe does the SLA pertain? Timeframes are generally divided into tiers depending on business criticality (9 to 5, 24 by 7).
*Are maintenance windows (for configuration changes, capacity changes, OS and application patch management) included or excluded from availability SLAs?
*Single versus multi-virtual machine vApps – Do multi-virtual machine vApps need to be treated as a single entity from an SLA perspective?
*End user response time SLA – This is generally focused on overall user experience, measuring response time from local and major remote sites to get a representative view. Measurement is implemented with remote simulators and by running automated robotic scripts.
*Recovery (system, data) SLA – What recovery time objectives and recovery point objectives need to be met?
*Are backups required?
*Is high availability required?
*Is fault tolerance required within the management cluster?
*Is automated disaster recovery failover required within certain time parameters?
*Privacy SLA (data security, access and control, compliance):
*Do data privacy requirements (encryption, others) exist?
*Are there regulatory requirements?
*Are specific roles and permission groups required?
*Provisioning SLA – Are there provisioning time requirements?
*SLA penalties:
*How are SLA penalties applied?
*Are they applied as service credits?
*What legal liabilities apply and how are they covered?
*Is there a termination for cause clause in the SLA?
*What defines an outage and who bears the burden of claim?
*What is the track record for delivering on SLAs? These SLA considerations should be applied to external service providers.