8. vCloud Operations Control : 8.4 Event, Incident and Problem Management : 8.4.4 Roles and Responsibilities for Event, Incident and Problem Management
   
8.4.4 Roles and Responsibilities for Event, Incident and Problem Management
The vCloud Center of Excellence (COE) model supports the Event, Incident and Problem Management of the vCloud services and the supporting infrastructure. Depending on the size and vCloud maturity of the vCloud organization, the model for managing events, incidents and problems is based on three levels for larger organizations. Each of these levels has an escalation path to the next level ,until SMEs are required to help resolve incidents or problems.
1. Initial responsibility for any incident lies with the Level 1 Service Desk or Operations Center such as a NOC, where the intention is to resolve as many incidents as possible. Measure this as a KPI.
2. Level 2 support is typically provided via a NOC where a general level of vCloud knowledge and skill exists.
3. Finally, Level 3 support is provided by the vCloud COE Subject Matter Experts (SMEs), as well as other technology specialists that provide resources and knowledge of the vCloud environment such as network, storage, and security. See section 5, Organizing for vCloud Operations, for more information about the COE.
The COE Analyst works with the Event Management Analyst so that event routing rules, runbook entries, and workflows defining event handling are well defined and accurate. They implement additional monitoring for events that indicate an incident has occurred, define event routing rules, runbook entries, and/or workflows to handle known events. They need to understand the monitoring implications of new Event Management rules, automations, and workflows and provide requirements for automation and workflow implementation, modification, and maintenance along with integration to other systems. They also work to categorize events for promotion to a workflow or support queue.
Specific to Incident Management, the COE Analyst and Administrator work with the Incident Management Analyst so that event routing rules, runbook entries, and workflows defining event handling are well defined and accurate. They identify recurring or high priority incidents that need to be looked at by problem management for root cause analysis. They also work with the infrastructure and application teams to categorize, manage, and resolve incidents, and work with the Service Desk to communicate status of incidents.
Specific to Problem Management, the COE Analyst work with the Problem Management Analyst to identify recurring or high priority problems that need root cause analysis and assist with the identification of the root cause. Additionally, they implement monitoring, event routing rules, runbook entries, and/or workflows to handle problem events. They also develop a plan to address the root cause of a problem, which might include a permanent solution or might require a workaround that is coordinated with Event Management.
For information about vCenter Operations Manager, refer to the latest VMware vCenter Operations Management Suite documentation (http://www.vmware.com/products/datacenter-virtualization/vcenter-operations-management/technical-resources.html).
For information about VMware vFabric Hyperic, see the latest product documentation (http://support.hyperic.com/display/DOC/HQ+Documentation).