A view on the health of the IT infrastructure is accomplished using monitoring tools – lots of them. This has been the approach for decades with differences revolving around how the data is collected (the age old agent vs. agentless argument), integration provided, how data is processed, how the tools are purchased, and increasingly creative ways to display red, green and yellow. However, it doesn’t matter if you use a high-cost, low-cost or no-cost monitoring tool the objective remains the same – get a view of the health of IT.
IT infrastructure monitoring is not glamorous but it is required, how else is IT operations going to confirm an issue reported through the service desk? However, the way monitoring is used today is not suitable for many of the requirements for monitoring moving forward.
Monitoring is splitting into two distinct approaches and depending on what you need from monitoring will determine how and what tools are used. The first approach is the traditional one, monitoring IT health. The second is using monitoring to enable an action which means; collecting and analyzing specific information and using it to support an automation procedure or running an action.
An example of the second approach is monitoring the performance of a cloud IT infrastructure stack where the objective is not simply to understand the health of the cloud environment but enable capacity to be dynamically allocated and changed in-line with usage or need (aka cloud elasticity). Add to this the fact that cloud environments are moving from server and storage capacity to application services then the ability to make changes in the cloud becomes far more complex. E.g. making a change to a cloud database may make sense for one application but have a detrimental impact on others.
Even though traditional monitoring performance tools are being used to provide a view on cloud health their ability to support decision making is problematic (see diagram 1).
1. Performance policy is defined within each monitoring tool focused on specific element and element type thresholds – not on overall cloud service performance.
2. Performance monitoring data does not show how one element’s performance impacts another (e.g. how changing a server or network configuration impacts multiple applications) – creating an inability to make trusted changes.
3. Challenges in pulling together (in real-time) multiple performance feeds into a coherent service or application view – creating a ‘lag’ in making changes and the need for multiple tools and teams to be involved.
Integrating multiple performance feeds to assess overall application/service impact requires a highly sophisticated performance consolidation tool that normalizes, consolidates, filters data and provides an accurate service impact that can be used to support or trigger an action. This tool does not exist.
However, there are sophisticated capacity tools able take performance data from multiple performance sources and optimize IT resource as a service (e.g. BMC’s BCO product). The best results are achieved when the data received from the supporting performance tools focuses specifically on the environment being provisioned/updated. This enables services to be changed (e.g. orchestrated through a service governor) with greater accuracy (e.g. supporting service placement or making decisions on requested changes in context of impact).
The future of IT infrastructure monitoring includes processing specific information to make trusted decisions. For example; cloud monitoring will have policy derived from the cloud blueprint (cloud service component architecture) with possible input from other sources (e.g. a service catalog to guide service levels) see diagram 2. This will result in one set of policy aimed specifically at the IT components supporting the cloud services to both assess cloud health and provide the actionable information needed to make safe changes. This differs from the traditional monitoring I mentioned previously which collects data from everything and then tries to apply filters and rules to reduce content to provide a true view of IT health.
Focusing on an outcome by monitoring a specific set of components allows the capacity tool to provide accurate placement decisions that can be executed through the governor and the provisioning and configuration tools.
Automated cloud decision making is just one example of the way monitoring is evolving. The same value could be attributed to any IT infrastructure automation initiative including agile development practices (e.g. DevOps).
Today we are at a crossroads. IT operations tools developed to monitor IT infrastructure health are increasingly being considered to provide highly accurate information to support automated decision-making . Even though this re-purposing can be achieved, the effort, cost and complexity is going to be prohibitive and reminds me of a line from an old Irish joke, “sir to get there, I wouldn’t have started off from here”.
It’s not as if a totally new set of tools is required, although for the cloud IT decisions may be provided from monitoring embedded in cloud management solutions.
Diagram 3 describes 5 areas of differentiation between tools used for monitoring health and one’s designed/used for aiding decisions. The most important differentiation being; the objective. This dictates policy, the environment monitored and the integrations required. If you want to make decisions you set policy based on the decision being made, if you want to check for infrastructure health you set policy based on component thresholds.
Just when you thought monitoring was already complex. It’s about to get more interesting.