8. vCloud Operations Control : 8.4 Event, Incident, and Problem Management : 8.4.1 Event, Incident and Problem Management Process Definition and Component
8.4.1 Event, Incident and Problem Management Process Definition and Component
The following must be in place for successful vCloud Event, Incident, and Problem Management:
*Monitoring of the vCloud environment.
*An event management system, such as a Manager of Manager (MoM), for applying rules to events that can launch workflows or route events to the appropriate support teams.
*A ticketing system and methodology so that various support teams are allocated tickets in an efficient manner.
*Defined incident priorities and severities.
*Well-understood roles and responsibilities.
*The ability to view KPI status.
The following figure shows the overall event, incident and problem management process and the interrelationship among the components. All three subject areas are shown together because they are intrinsically linked together. Event Management feeds into Incident Management, which in turn feeds into Problem Management. Problem Management then feeds back into Event Management to complete the cycle. Because IT is ever evolving and changing, Event, Incident, and Problem Management must be continually updated to keep pace.
Figure 24. High-Level Event, Incident, and Problem Management Processes
One of the first steps in Event Management is to monitor components and services. Events can then be fed into an Event Management system, such as a MoM, and metrics can be fed into an analytics engine, such as vCenter Operations Manager for processing.
A key component of Event Management is event categorization. After an event is categorized, rules and documentation such as runbooks and workflows can be developed to handle the event the next time it occurs. This proactive approach leads to fewer new incidents and reduces the duration and severity of the outages and performance incidents that do occur.
Core process areas of Incident Management include managing support tickets by determining priority and impact, customer communications, facilitating technical and management communication (including phone bridges), and closing out tickets.
When an incident is recurring or high priority, it is sent to Problem Management to identify the root cause. After a root cause is identified, a solution is developed to fix the problem or establish monitoring or event handling to eliminate the problem or reduce the severity the next time the problem occurs.