8. vCloud Operations Control : 8.4 Event, Incident, and Problem Management : 8.4.3 Process Automation and Tool Alignment/Integration
8.4.3 Process Automation and Tool Alignment/Integration
The vCloud Event, Incident, and Problem Management processes depend on tooling, and if the appropriate tools are not in place, it is difficult to manage and operate the environment while sustaining the required service levels. Traditionally, event, incident and problem management has relied heavily on tooling, and in a vCloud, the scope of the required tools increases. This is due to additional vCloud requirements, such as a greater need for early warning for impending incidents and a higher level of automation. For early warnings, increased functionality of the tools (for example, smart alerts, dynamic thresholds and intelligent analytics) help fulfill this requirement. For a higher level of automation, additional tools, such as vCenter Orchestrator, are required.
To realize the vCloud benefits of reliability and lower OpEx costs, it is not sufficient only to interpret events to highlight incidents and problems. It is also necessary to establish how incidents can be more efficiently identified, how remediation can be put in place quickly, and how to identifying the root cause to prevent the problem from happening again.
Because the vCloud resources and services supplied to vCloud customers are based on underlying vSphere resources, it is possible to use tools that manage and monitor at the vSphere level.
As shown in the following figure, vCenter Operations Manager can be used to provide an up-to-date understanding of the health of the vSphere environment as it relates to the vCloud provider virtual datacenters.
Figure 25. vCenter Operations Manager Event and Incident Management
The Health badge shows a score that indicates the overall health of the selected object. The object can be a vCenter instance, vSphere datacenter, cluster, host, or datastore. The monitoring mechanism provides proactive analysis of the performance of the environment and determines when the health of the object reaches a level that indicates an incident may be about to occur. To enforce effective management, the vCloud NOC can be provided with a dashboard that shows key metrics that indicate the health of the environment.
The score shown for the Health badge is calculated from the following sub-badges:
*WorkloadProvides a view of how hard the selected object is working.
*Anomalies – Provides an understanding of metrics that are outside of their expected range.
*Faults – Provides detail of any infrastructure events that may impact the selected objects availability.
For faults, active vCenter events or alerts are used. These can include host hardware events, virtual machine FT and HA issues, vCenter health issues, cluster HA issues, and so on. The vCenter alerts are supplied through the vSphere adapter into vCenter Operations Manager, and can be used to identify root cause. Additionally, alerts are generated by vCenter Operations Manager if a sub-badge score hits a predefined value.
The events or alerts appear as faults, as shown in the following figure.
Figure 26. vCenter Operations Manager Faults
Any fault can be selected to gain further information. In the following figure, the event is associated with a host and indicates that an uplink has been lost.
Figure 27. vCenter Operations Alert
In addition to using vCenter Operations Manager for vSphere metrics and events, VMware vFabric™ Hyperic® can be used to provide operating system and application metrics. Providing these metrics to vCenter Operations Manager further enhances the incident management toolset.