proactive sounds cool, but being reactive is just easier.

predisctiveRecently I’ve been involved in discussions about how new IT monitoring tools will make IT support teams smarter and far more proactive.  By smarter I mean having a greater understanding of IT health and by proactive I mean being aware of situations before or as they occur.

I’d argue that becoming smarter is a prerequisite to becoming proactive.  Monitoring for issues is so much easier when you know what you are looking for and understand the ramifications. The best way for IT support to become smarter is hire smartest, most experienced people. Becoming proactive is not so straight forward.

The idea that tools will make a reactive, crisis driven, IT operations team into a proactive one is nonsense. For decades monitoring tools have been able to set policy forewarning of events and giving support staff a heads-up on potential issues. The reasons this capability has not delivered on the promise are numerous; including events that are ‘potential issues’ or ‘warnings’ rarely classified as a high priority items, support staff not noticing them (or ignored them) or the method of event delivery being the wrong one.  It has had little to do with the monitoring tools. Reality is; most IT organizations are not measured on outage avoidance but on fixing issues once outages occur.

It’s easier to be the hero who got the order processing application back up than the person who said they had helped avoid the problem occurring in the first place (“you did what?” “oh sure you did, well done – help yourself to a medal”).   If an organization wants to be proactive then it needs to have people goaled and measured on finding issues before they become problems. Security officers actively monitor and analyze data to proactively identify anomalies,  irregular activity and behaviors, monitored events to stop hackers, cyber attacks, virus’ etc. Apparently, it is not acceptable to wait for security problems to occur before they get addressed.  For IT support to do this will require a number of changes including;

  1. an organization measured against outage avoidance.
  2. information delivered in ways that the support team will take notice of.
  3. information that means something and is actionable.

an organization measured against outage avoidance. An IT organization that prides itself on being proactive but measures itself against MTTR or MTBF is not fully proactive. The speed IT operations responds and fixes an issue is not a good measure of proactive efficiency without factoring in the speed issue was detected in the first place. IT operations effectiveness would have greater relevance if it was tied to outage avoidance.  This type of metric is not easy to capture using monitoring tool reporting (too many sources, limited business impact assessment) so it requires a way to immediately consolidate, log and track the identification to remediation process. The easiest way to do this is using a service desk.  This information would demonstrate how IT operations provides value, while showing increases in IT operational efficiencies.

information delivered in ways that the support team will take notice of. IT organizations invest a lot of time and effort trying to detect and process events, but few put the same effort into ensuring events are immediately delivered to the right IT personnel. A proactive state dictates that event data is delivered and owned as soon as it is detected. This means the mechanism chosen to deliver the data is as important as the effort associated with collecting the information in the first place. Most IT organizations still rely on event management tool consoles; however, an unwatched console will result in missed events. Sending events to mobile devices (e.g. in the form of an IM) and/or the use of alert notification tools can reduce the time it takes to become event-aware. Alert notification tools help support a proactive objective by automating the delivery of alerts to the appropriate IT operations personnel through the most-effective communications channel, in support of established escalation and outage procedures and also provide the mechanism for an event to be delivered, acknowledged and owned.

information that means something and is actionable. If you are not actively looking for something, it’s unlikely you’ll find it. A blindingly obvious statement but when monitors are being used in IT operations they are typically being used to aid root-cause-analysis on known reported issues where support knows there’s an issue and understands the sort of thing they need to look for. However, when there is no obvious problem it takes skills and experience to scroll through long lists of technical event data to identify the most critical, business impacting issues.  Knowing how things relate to the bigger picture requires the skill to assess the overall impact of multiple unassociated events and that means taking the yellow ones as seriously as the red ones.  This approach is the new way IT support must work, looking for subtle changes and behaviours in the IT infrastructure, applications and IT consumers, analyzing potential impacts and executing a plan to remediate the issue before it effects the business. This approach demands dedicating support personnel to IT analysis and moving them away from monitoring consoles when they have time or are motivated to do so by complaints from IT consumers.

If you ignore the price to the business, being reactive doesn’t cost a thing.

if NASA monitored like IT operations would they have made it to the moon?

rocket2In nearly every job I’ve had IT monitoring has been somewhere, either core to my day job or peripherally around the edge. Even though monitoring has been with us for decades it still attracts massive amounts of attention from IT organizations, vendors and Venture Capital. Red, green, yellow, yellow, green, red, how hard can it be?  There have been major shifts in finding new ways to understand the health of IT including; SNMP monitors in the early 1990’s and, more recently, the various flavors of APM products. For a software company to make a difference and successful selling a product in this space it really needs to innovate and provide something better. A lot better.  So I get tired when people say, “monitoring, it’s done isn’t it?”

It’s not. Not by a long long way.

Gartner published a report in May 2013 titled Market Share Analysis: IT Operations Management Software, Worldwide, 2012 (ID: G00249133). In this report it says that the 2012 application performance monitoring (APM) market is over $2 billion growing at 6.5% with the availability and performance monitoring market (IT infrastructure monitoring) being $2.8 billion growing at 7.6%. Even though these IT monitoring areas are considered separate market spaces the ideal is to combine them allowing IT organizations to understand the impact the IT infrastructure has on the applications and visa versa.  So when both areas are combined they become the largest IT management market segment with over 25% of the $18B total market. To put this into perspective the joint APM/Availability and Performance revenues (~$4.8B) is larger than configuration management, the second largest market segment, by over $1B which is also growing at a slower rate (6.3%).

Large. small, service provider, telco, SMB or enterprise, everybody has monitoring so the fact that it remains the highest growth IT management space is amazing. Even though it’s a huge market not dominated by a few vendors. It is a highly fragmented space with dozens of vendors and hundreds of tools.

Monitoring remains one of the most fragmented IT management spaces with tools from dozens of vendors ranging from $free to $hundreds of thousands. To remain relevant demands constant innovation with innovation coming from many areas including event collection, event consolidation, event processing, event reporting, ease of use, low complexity, high sophistication, product delivery, and product pricing and licensing. With the need to get clarity on IT services and also reduce the cost and effort to achieve it better ways to monitor are constantly being sought.

all monitoring is not the same
When people think of monitoring an image that comes to mind is of NASA and the way it monitors a moon launch. Dozens of people intensely looking at monitors anxiously looking for irregularities and working closely with all their colleagues to identify potential issues that may impact the success of the objective and the safety of the astronauts. Even though each person may have a different view of the health of the mission collaboration between the team members ensures that at an holistic view is understood at all times. Throughout the mission priorities change so does what and how each stage is monitored. In addition, the information displayed on the monitors is continually analyzed and correlated with other data with the objective to seek out potential issues that the individual monitoring displays may not make clear.  NASA monitors space missions with the assumption that something will go wrong demanding an immediate response to remediate the problem and ensure the success of the mission.

putting too much emphasis on the tools
For decades ITmonitoring ship professionals have used products to give them visibility into the health of the IT infrastructure which is monitored in fragmented piece parts with disparate non-collaborative teams all providing different vieship dragging astronautws on the health of IT. For many monitoring is accomplished when resources are available and unlike NASA most IT organizations assume everything is fine and look to monitoring to confirm a reported outage and to aid root-cause analysis.

IT organizations depend on tools to provide an understanding on the state of IT. Unfortunately IT continues to fragment and increase in complexity driving organizations to employ more monitoring tools in an attempt to gain clarity on overall IT health. However instead of making things easier to understand this creates additional challenges with each IT support organization providing increasingly different and potentially conflicting views on the health of the IT infrastrScreen shot 2013-07-24 at 10.49.14 AMucture. Some organizations using dozens of monitoring tools covering every aspect of their IT environment have no ability to clearly identify issues and the impact they have on the business. With each IT support team looking through different monitoring lenses the ability to gain and holistic trusted view becomes almost impossible.

avoid liability and attribute blame
When the business is impacted by an IT issue many organizations bring together the different IT support teams to help identify what the issue was, how it was detected and how to avoid the issue occurring again. Even though the senior IT executives do this to pacify and assure the business of IT’s competency and value each IT support organization will use their monitoring tools as evidence with which to prove either it was not their issue or show that the issue was identified and resolved in-line with company policy and service levels. This behavior changes monitoring from a proactive, issue avoidance practice to one where it is used to prove innocence and assign blame.

infrastructure availability does not equal application availability
IT problem optionsRoutinely IT support organizations use the statistics gathered by their monitoring tools to show effectiveness, IT availability and business value. Each IT component is monitored to a set of policies primarily derived  by how each IT team associates value to the components. The traditional 99.9% availability objective is still used by IT operations as a way to show IT availability. Unfortunately the business does not equate availability with how each component is functioning. IT availability is measured by the performance and availability of the applications and the support the IT organization provides. These two viewpoints on how IT value is measured creates confusion and conflict with IT support teams unable to comprehend the fact  that the business does not care about the individual health of each IT component. A business manager will assess the value of the IT organization based on the opinions and input of the  people who consumed the IT resource and not on a mountain of confusing, irrelevant technical detail that conflicts with  the IT consumer experience. In some cases this situation will drive the business to seek alternative IT providers for new applications and IT services.

how much are IT service quality problems costing business?
The reality is that while monitoring is employed in nearly every business that uses IT is not used effectively.  While tools for monitoring are designed to provide proactive warnings of issues the effectiveness of the tools can only be realized when they are used to show business impact augmented by an organization focused on proactive monitoring practices and collaborative team work. Being proactive requires more than just monitoring tools, it requires;

  1. an organization that actually seeks out issues
  2. information delivery mechanisms that the support teams will take notice of
  3. information delivered in meaningful ways, preferably associated with service levels and business impact

monitoring evolved
Even though monitoring continues to be updated it’s an evolution not a set of dramatic changes.  In the 1990s the focus was on the data center elements because for many that is where a majority of the IT resources were. Over time the need to understand how IT resources were being provided moved monitoring from basic availability to measuring performance and a set of processes and best practices to ensure specific outages and IT service degradations did not occur again.  More recently monitoring has evolved in multiple directions. The dynamic nature of the IT infrastructure demands that monitoring is able to keep up with constant change and business priorities.  This demand has created a new set of monitoring tools that dynamically discover IT components, establish relationships through various communication methods and dynamically map, in real-time, how IT resources are used in support of the changing needs of the business. The highly distributed and fragmented IT infrastructure created a demand for tools that can actively search and associate disparate data from disparate sources and then provide, through analysis, information on IT health that could not be achieved by the more traditional monitoring approaches.  And lastly, the way business consumes IT has forced many IT organizations to focus on the end-user experience.  Only by focusing on how end-users consume IT resources will the IT organization be able to fully understand and support the business.

Summarizing all this…
IT and business are synonymous. Monitoring IT like it’s a network and a bunch of servers going to result in the business demanding more relevant and accurate service measurement – specific to applications availability and performance and IT consumer experience.  The critical impact IT has on business means executives continually evaluate the support and services provided by the IT organization and assess ways for improvement.  For business IT value is a very easy metric to measure; availability, performance, responsiveness, flexibility and support. In addition, IT consumers have become major influencers of how IT services are evaluated, delivered and consumed demanding a different view to understand the health of IT services.  As IT consumers use IT resources beyond the corporate data center the value of IT is assessed as an overall experience no matter where applications are sourced, what access methods are used or where support is located. The only way to fully understand how the business views IT services is to monitor how IT consumers use IT.

High volumes of disparate event data creates confusion and conflict demanding technology that consolidates, correlates and prioritizes issues aligned with how the business consumes IT services.
IT organizations will still use tools that monitor specific IT elements as these allow specialists to have a greater/deeper understanding providing the ability to identify a problem’s root cause however, these types of monitoring tools are used as event sources feeding monitoring products able to consolidate, filter, correlate and prioritize issues in line with IT service delivery. The ability to achieve this objective demands technology that can easily integrate and associate data into information relevant to both the IT organization and the business.