8. vCloud Operations Control : 8.4 Event, Incident, and Problem Management
8.4 Event, Incident, and Problem Management
Traditionally, Event, Incident, and Problem Management focused on monitoring the services offered from the vCloud and on minimizing impact from unplanned events. Restoring service as rapidly as possible and preventing repeat events from affecting services were also core functions. Today, there is an increased emphasis on reducing vCloud OpEx cost and increasing reliability. This can be achieved by increasing automation, allowing operators to handle more routine tasks, and proactively detecting and eliminating incidents before they impact end users.
Event Management focuses on how to categorize and handle outputs from monitoring and analytics tools. Based on predefined rules, inputs to event management are called events and can be associated with a variety of possible actions ranging from suppression, to triggering an automatic workflow, to triggering an incident to be created in the case of a performance incident or an actual outage.
Incident Management focuses on how to handle performance incidents or outages. Such occurrences are referred to as incidents. The primary focus of incident management is to manage the incident until it is resolved. Recurring incidents or incidents that are high priority can be referred to Problem Management for further investigation.
Problem Management focuses on identifying root causes for recurring and high priority incidents. After a root cause has been identified, a plan of action is generated that, ideally, repairs the underlying problem. If the problem cannot be fixed, additional monitoring and event management handling might be implemented to minimize or eliminate future occurrences of the problem.
One of the main benefits of implementing a vCloud environment is to lower ongoing OpEx costs. A key to realizing this goal is vCloud Event, Incident, and Problem Management process automation that consists of the following:
*Automating responses to events when possible.
*Creating highly automated workflows to other events where some operator input is required as part of decision support.
*Creating runbook entries, workflows, and automations so that operators can handle many more events (instead of administrators or subject matter experts).
*Automating interaction between the vCloud Event, Incident, and Problem Management process and other required processes and associated systems.
*Identifying, instrumenting, and developing key performance indicators (KPIs) that can be used to develop workflows and automations.