Author Archives: Opsleuth

Finding Value in Automation

Screen Shot 2015-02-02 at 9.25.00 PMUnmeasured IT automation has little value.  Even though IT organizations appreciate what automation provides, once deployed it is rarely measured against value to the business.  When automation works it’s ignored, when automation breaks – it’s to blame.  When automation is developed internally the effort and cost are sucked up as part of a project or developed without any real accountability.  When IT organizations want to buy an automation product the justification is typically evaluated against head-count or time saving.  Initially, this is fine but if ongoing savings are not captured additional investment will need new justification – ideally based on business value.

Over the past year I’ve spoken with dozens of IT organizations to try to understand how they captured IT automation value. Responses varied but included “we capture it by logging activation’s” and  ”we don’t capture it”.  Capturing automation activations is fine, but when something runs 300 times – who cares? The reason for not capturing automation value was the value of automation should be obvious.  The problem is it isn’t.

IT automation continues to be a major initiative, but for many it fails to deliver on its promise because the value it provides is not understood.

In February 2014 I started work on a practitioners guide to automation. I didn’t want it to be something that had to be read cover to cover (who has time for that?) or something that tried to be a definitive all encompassing automation book (too many automation variations).  I wanted to write something that provided a way companies could gain a quick understanding of their automation state, assess value using real calculations and then plot a strategic path forward.

To meet these objectives the following was done:

  • To keep the guide focused it addresses three specific automation use-cases; Provisioning & Configuration, Patching & Compliance and Cloud.
  • To assess automation state and set a strategic path the guide breaks each use-case into five levels (ad-hoc to advanced)
  • To ensure all aspects of automation were covered the guide covers: process, people and tools
  • To measure automation value each level has specific calculations for cost, speed and risk

Screen Shot 2015-02-02 at 7.07.02 PM

                                                       Automation Use-Cases Aligned to Automation Levels

Even though this document was written while working for BMC however, it is not focused specifically on BMC products and services.  BMC decided to call the guide the Automation Passport. It focuses on what needs to be done irrespective of what technology is being used or planned to be used. Companies are using it to plan an automation strategy with the process maps defining what management tools are needed, how they should integrate and what data/information is passed. If a company’s automation approach is to use open source automation toolkits (e.g. puppet and chef), embedded automation tools and home-grown talent for development, the passport provides a view on the scope of the automation required and the associated effort.

The following diagrams are examples of the detail contained within the guide.

Process

Provides a view of what needs to be done at each automation level within each use-case. This level of detail can be used as a foundation and tuned to meet each IT organizations specific requirements.

Screen Shot 2015-02-02 at 9.45.24 PMExample: Patch Process (a level within Patching & Compliance)

People

Provides a view on how an organization and roles change as automation maturity increases.

Screen Shot 2015-02-02 at 9.51.27 PM

Example: Roles for Provisioning & Configuration

Technology

Explains the types of technology used at each level in-line with the automation process (orange boxes indicate the technology introduced at this level).

Screen Shot 2015-02-02 at 10.04.05 PM

 Example: Technology  supporting  the Govern level process for the Cloud automation use-case

Last year BMC released the first edition of the Automation Passport. It contained the automation model mapped to the  Provisioning & Configuration and Patching & Compliance use-cases.

In January 2015 BMC released the Automation Passport Early Release Edition. This updated version contains the Cloud Automation use-case, value calculations for each use-case level, greater detail on automation roles and responsibilities (including job descriptions), cloud type definitions and explanations on how capacity, performance and availability management tools support and evolve to support the automation of cloud environments.

More detail on the passport can be found at the following location: http://www.bmc.com/it-solutions/automation-passport.html

The latest automation passport can be downloaded (no strings attached) from: http://documents.bmc.com/products/documents/36/96/453696/453696.pdf

Congratulations, your IT might be less sick today

help desk fire I came across an article in Computerworld titled “The Help Desk is Hot Again” articulating the revived popularity of the Help Desk. It explains that the Help Desk “serves as a vital liaison between employees’ mobile technologies and the networks, servers and applications that support them.” Help Desks certainly serve an important purpose however, this positioning feels slightly askew. For most IT organizations the Help Desk is where you go when you have a problem and need help. Help Desks do not understand how IT consumers are experiencing IT and are certainly not a liaison. I can see how there is a logical leap from issue management to evaluating the health of IT but do you go to the doctor when you are well?

Until recently, visibility into the consumer side of IT was not considered essential when measuring IT service availability. The assumption was that maniacally monitoring data center health provided enough data to show how effectively IT supported the business. For most organizations, IT availability and ‘end-user’ satisfaction is evaluated with metrics provided by the help desk, showing what went wrong and when. From the perspective of issues this may be acceptable but it hardly provides an accurate view on how the business is using IT. It would be like asking a doctor “so, how healthy does the world look today?,” where the answer would be “It looks pretty sick”.

This whole situation has been exacerbated by the use of mobile devices, the growth in non-corporate cloud-based application sources and the influx of people entering the industry who were born digital.  These new market entrants have learned to become more self-sufficient than any generation before and would rather have the flu than call the service desk. Many of today’s mobile issues are ‘fleeting’ with performance being a variable impacted by increasingly complex and congested network connectivity. For many, it’s easier just to wait it out.  Does the help desk capture this experience? No.

So, if the objective is to understand how IT is used and experienced, then you don’t start from the data center. The starting place is the IT consumer. This requires more than a set of tools giving visibility from ‘the edge,’ it will require IT support to organize and focus teams on IT consumer activity.  Measuring experience means understanding how IT is used, when it is used and where it is used, not just when there is an issue. Capturing, monitoring and analyzing IT consumer activity allows IT organizations to assess the true IT business impact, regardless of where the user is, what they are using or where their applications are sourced.

This approach is not going to be an easy for IT departments that have spent decades focusing on silo’d data center elements and back-end applications transactions. IT consumer activity monitoring is not an option. Users do not use one device, do not remain in one place and do not use just one application. IT innovation, mobility and IT consumer creativity will continue to push the limits of IT operations management with those able to adjust their IT management focus benefiting from greater IT decision making and business alignment.

The service desk must evolve to be a true high-touch solution and this can only be done when it is also used to monitor how all IT consumers are experiencing IT.   IT organizations that do not plan to focus on their IT consumers will be left struggling, trying to manage increasingly diverse IT needs using tools providing a datacenter centric application performance snapshot, stumbling their way towards the edge by trying to see through increasingly complex third-party service black-holes.

proactive sounds cool, but being reactive is just easier.

predisctiveRecently I’ve been involved in discussions about how new IT monitoring tools will make IT support teams smarter and far more proactive.  By smarter I mean having a greater understanding of IT health and by proactive I mean being aware of situations before or as they occur.

I’d argue that becoming smarter is a prerequisite to becoming proactive.  Monitoring for issues is so much easier when you know what you are looking for and understand the ramifications. The best way for IT support to become smarter is hire smartest, most experienced people. Becoming proactive is not so straight forward.

The idea that tools will make a reactive, crisis driven, IT operations team into a proactive one is nonsense. For decades monitoring tools have been able to set policy forewarning of events and giving support staff a heads-up on potential issues. The reasons this capability has not delivered on the promise are numerous; including events that are ‘potential issues’ or ‘warnings’ rarely classified as a high priority items, support staff not noticing them (or ignored them) or the method of event delivery being the wrong one.  It has had little to do with the monitoring tools. Reality is; most IT organizations are not measured on outage avoidance but on fixing issues once outages occur.

It’s easier to be the hero who got the order processing application back up than the person who said they had helped avoid the problem occurring in the first place (“you did what?” “oh sure you did, well done – help yourself to a medal”).   If an organization wants to be proactive then it needs to have people goaled and measured on finding issues before they become problems. Security officers actively monitor and analyze data to proactively identify anomalies,  irregular activity and behaviors, monitored events to stop hackers, cyber attacks, virus’ etc. Apparently, it is not acceptable to wait for security problems to occur before they get addressed.  For IT support to do this will require a number of changes including;

  1. an organization measured against outage avoidance.
  2. information delivered in ways that the support team will take notice of.
  3. information that means something and is actionable.

an organization measured against outage avoidance. An IT organization that prides itself on being proactive but measures itself against MTTR or MTBF is not fully proactive. The speed IT operations responds and fixes an issue is not a good measure of proactive efficiency without factoring in the speed issue was detected in the first place. IT operations effectiveness would have greater relevance if it was tied to outage avoidance.  This type of metric is not easy to capture using monitoring tool reporting (too many sources, limited business impact assessment) so it requires a way to immediately consolidate, log and track the identification to remediation process. The easiest way to do this is using a service desk.  This information would demonstrate how IT operations provides value, while showing increases in IT operational efficiencies.

information delivered in ways that the support team will take notice of. IT organizations invest a lot of time and effort trying to detect and process events, but few put the same effort into ensuring events are immediately delivered to the right IT personnel. A proactive state dictates that event data is delivered and owned as soon as it is detected. This means the mechanism chosen to deliver the data is as important as the effort associated with collecting the information in the first place. Most IT organizations still rely on event management tool consoles; however, an unwatched console will result in missed events. Sending events to mobile devices (e.g. in the form of an IM) and/or the use of alert notification tools can reduce the time it takes to become event-aware. Alert notification tools help support a proactive objective by automating the delivery of alerts to the appropriate IT operations personnel through the most-effective communications channel, in support of established escalation and outage procedures and also provide the mechanism for an event to be delivered, acknowledged and owned.

information that means something and is actionable. If you are not actively looking for something, it’s unlikely you’ll find it. A blindingly obvious statement but when monitors are being used in IT operations they are typically being used to aid root-cause-analysis on known reported issues where support knows there’s an issue and understands the sort of thing they need to look for. However, when there is no obvious problem it takes skills and experience to scroll through long lists of technical event data to identify the most critical, business impacting issues.  Knowing how things relate to the bigger picture requires the skill to assess the overall impact of multiple unassociated events and that means taking the yellow ones as seriously as the red ones.  This approach is the new way IT support must work, looking for subtle changes and behaviours in the IT infrastructure, applications and IT consumers, analyzing potential impacts and executing a plan to remediate the issue before it effects the business. This approach demands dedicating support personnel to IT analysis and moving them away from monitoring consoles when they have time or are motivated to do so by complaints from IT consumers.

If you ignore the price to the business, being reactive doesn’t cost a thing.

if NASA monitored like IT operations would they have made it to the moon?

rocket2In nearly every job I’ve had IT monitoring has been somewhere, either core to my day job or peripherally around the edge. Even though monitoring has been with us for decades it still attracts massive amounts of attention from IT organizations, vendors and Venture Capital. Red, green, yellow, yellow, green, red, how hard can it be?  There have been major shifts in finding new ways to understand the health of IT including; SNMP monitors in the early 1990′s and, more recently, the various flavors of APM products. For a software company to make a difference and successful selling a product in this space it really needs to innovate and provide something better. A lot better.  So I get tired when people say, “monitoring, it’s done isn’t it?”

It’s not. Not by a long long way.

Gartner published a report in May 2013 titled Market Share Analysis: IT Operations Management Software, Worldwide, 2012 (ID: G00249133). In this report it says that the 2012 application performance monitoring (APM) market is over $2 billion growing at 6.5% with the availability and performance monitoring market (IT infrastructure monitoring) being $2.8 billion growing at 7.6%. Even though these IT monitoring areas are considered separate market spaces the ideal is to combine them allowing IT organizations to understand the impact the IT infrastructure has on the applications and visa versa.  So when both areas are combined they become the largest IT management market segment with over 25% of the $18B total market. To put this into perspective the joint APM/Availability and Performance revenues (~$4.8B) is larger than configuration management, the second largest market segment, by over $1B which is also growing at a slower rate (6.3%).

Large. small, service provider, telco, SMB or enterprise, everybody has monitoring so the fact that it remains the highest growth IT management space is amazing. Even though it’s a huge market not dominated by a few vendors. It is a highly fragmented space with dozens of vendors and hundreds of tools.

Monitoring remains one of the most fragmented IT management spaces with tools from dozens of vendors ranging from $free to $hundreds of thousands. To remain relevant demands constant innovation with innovation coming from many areas including event collection, event consolidation, event processing, event reporting, ease of use, low complexity, high sophistication, product delivery, and product pricing and licensing. With the need to get clarity on IT services and also reduce the cost and effort to achieve it better ways to monitor are constantly being sought.

all monitoring is not the same
When people think of monitoring an image that comes to mind is of NASA and the way it monitors a moon launch. Dozens of people intensely looking at monitors anxiously looking for irregularities and working closely with all their colleagues to identify potential issues that may impact the success of the objective and the safety of the astronauts. Even though each person may have a different view of the health of the mission collaboration between the team members ensures that at an holistic view is understood at all times. Throughout the mission priorities change so does what and how each stage is monitored. In addition, the information displayed on the monitors is continually analyzed and correlated with other data with the objective to seek out potential issues that the individual monitoring displays may not make clear.  NASA monitors space missions with the assumption that something will go wrong demanding an immediate response to remediate the problem and ensure the success of the mission.

putting too much emphasis on the tools
For decades ITmonitoring ship professionals have used products to give them visibility into the health of the IT infrastructure which is monitored in fragmented piece parts with disparate non-collaborative teams all providing different vieship dragging astronautws on the health of IT. For many monitoring is accomplished when resources are available and unlike NASA most IT organizations assume everything is fine and look to monitoring to confirm a reported outage and to aid root-cause analysis.

IT organizations depend on tools to provide an understanding on the state of IT. Unfortunately IT continues to fragment and increase in complexity driving organizations to employ more monitoring tools in an attempt to gain clarity on overall IT health. However instead of making things easier to understand this creates additional challenges with each IT support organization providing increasingly different and potentially conflicting views on the health of the IT infrastrScreen shot 2013-07-24 at 10.49.14 AMucture. Some organizations using dozens of monitoring tools covering every aspect of their IT environment have no ability to clearly identify issues and the impact they have on the business. With each IT support team looking through different monitoring lenses the ability to gain and holistic trusted view becomes almost impossible.

avoid liability and attribute blame
When the business is impacted by an IT issue many organizations bring together the different IT support teams to help identify what the issue was, how it was detected and how to avoid the issue occurring again. Even though the senior IT executives do this to pacify and assure the business of IT’s competency and value each IT support organization will use their monitoring tools as evidence with which to prove either it was not their issue or show that the issue was identified and resolved in-line with company policy and service levels. This behavior changes monitoring from a proactive, issue avoidance practice to one where it is used to prove innocence and assign blame.

infrastructure availability does not equal application availability
IT problem optionsRoutinely IT support organizations use the statistics gathered by their monitoring tools to show effectiveness, IT availability and business value. Each IT component is monitored to a set of policies primarily derived  by how each IT team associates value to the components. The traditional 99.9% availability objective is still used by IT operations as a way to show IT availability. Unfortunately the business does not equate availability with how each component is functioning. IT availability is measured by the performance and availability of the applications and the support the IT organization provides. These two viewpoints on how IT value is measured creates confusion and conflict with IT support teams unable to comprehend the fact  that the business does not care about the individual health of each IT component. A business manager will assess the value of the IT organization based on the opinions and input of the  people who consumed the IT resource and not on a mountain of confusing, irrelevant technical detail that conflicts with  the IT consumer experience. In some cases this situation will drive the business to seek alternative IT providers for new applications and IT services.

how much are IT service quality problems costing business?
The reality is that while monitoring is employed in nearly every business that uses IT is not used effectively.  While tools for monitoring are designed to provide proactive warnings of issues the effectiveness of the tools can only be realized when they are used to show business impact augmented by an organization focused on proactive monitoring practices and collaborative team work. Being proactive requires more than just monitoring tools, it requires;

  1. an organization that actually seeks out issues
  2. information delivery mechanisms that the support teams will take notice of
  3. information delivered in meaningful ways, preferably associated with service levels and business impact

monitoring evolved
Even though monitoring continues to be updated it’s an evolution not a set of dramatic changes.  In the 1990s the focus was on the data center elements because for many that is where a majority of the IT resources were. Over time the need to understand how IT resources were being provided moved monitoring from basic availability to measuring performance and a set of processes and best practices to ensure specific outages and IT service degradations did not occur again.  More recently monitoring has evolved in multiple directions. The dynamic nature of the IT infrastructure demands that monitoring is able to keep up with constant change and business priorities.  This demand has created a new set of monitoring tools that dynamically discover IT components, establish relationships through various communication methods and dynamically map, in real-time, how IT resources are used in support of the changing needs of the business. The highly distributed and fragmented IT infrastructure created a demand for tools that can actively search and associate disparate data from disparate sources and then provide, through analysis, information on IT health that could not be achieved by the more traditional monitoring approaches.  And lastly, the way business consumes IT has forced many IT organizations to focus on the end-user experience.  Only by focusing on how end-users consume IT resources will the IT organization be able to fully understand and support the business.

Summarizing all this…
IT and business are synonymous. Monitoring IT like it’s a network and a bunch of servers going to result in the business demanding more relevant and accurate service measurement – specific to applications availability and performance and IT consumer experience.  The critical impact IT has on business means executives continually evaluate the support and services provided by the IT organization and assess ways for improvement.  For business IT value is a very easy metric to measure; availability, performance, responsiveness, flexibility and support. In addition, IT consumers have become major influencers of how IT services are evaluated, delivered and consumed demanding a different view to understand the health of IT services.  As IT consumers use IT resources beyond the corporate data center the value of IT is assessed as an overall experience no matter where applications are sourced, what access methods are used or where support is located. The only way to fully understand how the business views IT services is to monitor how IT consumers use IT.

High volumes of disparate event data creates confusion and conflict demanding technology that consolidates, correlates and prioritizes issues aligned with how the business consumes IT services.
IT organizations will still use tools that monitor specific IT elements as these allow specialists to have a greater/deeper understanding providing the ability to identify a problem’s root cause however, these types of monitoring tools are used as event sources feeding monitoring products able to consolidate, filter, correlate and prioritize issues in line with IT service delivery. The ability to achieve this objective demands technology that can easily integrate and associate data into information relevant to both the IT organization and the business.

a path to improving end user experience

smilie 2I don’t believe anyone can dispute the growing influence end users have on how IT services are chosen, sourced and evaluated.  This does not mean IT operations organizations are ready to fully embrace the end user as a specific focus.  Many assume application transaction monitoring and mobile device software update support is enough – at least for the time being.  The reality is it isn’t enough and treating the end user like peripheral hardware is not to their benefit. This is managing the situation – not enabling the end user.

Improving end user experience is not about keeping an eye on them or trying to support their mobile devices it’s about removing IT barriers, reducing complexity and making them more self sufficient and productive. This objective is best broken down into logical areas;

  1. Support
  2. Social Enablement
  3. Security & Resilience
  4. Productivity

Each area has a set of activities and objectives:

  • Support: Identify, address and report common/local issues, pre-emptive problem management and real-time end user IT status specific their individual needs and priorities.
  • Social Enablement:  Social, communication and collaboration tools to foster and enable information flow between different users with common interests, goals and objectives.
  • Security & Resilience: End user and device authentication, content protection and data protection and recovery.
  • Productivity: BYOD enablement allows the conducting of business from any device and location. Users download and given access to applications and access to local resources and information on company facilities based on their specific needs and within company policy.

It is unrealistic to think the objectives for each activity can be accomplished all at once. They are only achievable if each activity has a path containing logical, measurable steps.  This is also needed as each activity can have ties to others (e.g. to deliver a level of support requires a level of security and resilience).

In the paper Path to Improving the End-User Experience the activities are explained and broken down into the five levels (undefined, reactive, proactive, service and business) providing objectives to assess the current end user environment and improve upon it.

A barrier to success is IT operations’ need to enable the users from the datacenter perspective.  If the end user is the focus then the starting point is the end user (do IT users care about the datacenter?).  However, to show value a plan must have two perspectives, one IT operations and the other the end user.  In the paper each level describes the activity and value to both IT operations and the end user.  This allows IT operations to associate effort and investment directly with end user productivity.

Improving end-user experience, satisfaction and making them more productive increases a company’s effectiveness and makes it more competitive. It’s a no-brainer.

IT Infrastructure monitoring. Red, green, yellow is no longer enough.

health headA view on the health of the IT infrastructure is accomplished using monitoring tools – lots of them. This has been the approach for decades with differences revolving around how the data is collected (the age old agent vs. agentless argument), integration provided, how data is processed, how the tools are purchased, and increasingly creative ways to display red, green and yellow.  However, it doesn’t matter if you use a high-cost, low-cost or no-cost monitoring tool the objective remains the same – get a view of the health of IT.

IT infrastructure monitoring is not glamorous but it is required, how else is IT operations going to confirm an issue reported through the service desk? However, the way monitoring is used today is not suitable for many of the requirements for monitoring moving forward.

Monitoring is splitting into two distinct approaches and depending on what you need from monitoring will determine how and what tools are used.  The first approach is the traditional one, monitoring IT health. The second is using monitoring to enable an action which means; collecting and analyzing specific information and  using it to support an automation procedure or running an action.

An example of the second approach is monitoring the performance of a cloud IT infrastructure stack where the objective is not simply to understand the health of the cloud environment but enable capacity to be dynamically allocated and changed in-line with usage or need (aka cloud elasticity). Add to this the fact that cloud environments are moving from server and storage capacity to application services then the ability to make changes in the cloud becomes far more complex. E.g. making a change to a cloud database may make sense for one application but have a detrimental impact on others.

Even though traditional monitoring performance tools are being used to provide a view on cloud health their ability to support decision making is problematic (see diagram 1).

monitoring diagram11. Performance policy is defined within each monitoring tool focused on specific element and element type thresholds – not on overall cloud service performance.
2. Performance monitoring data does not show how one element’s performance impacts another (e.g. how changing a server or network configuration impacts multiple applications) – creating an inability to make trusted changes.
3. Challenges in pulling together (in real-time) multiple performance feeds into a coherent service or application view – creating a ‘lag’ in making changes and the need for multiple tools and teams to be involved.

Integrating multiple performance feeds to assess overall application/service impact requires a highly sophisticated performance consolidation tool that normalizes, consolidates, filters data and provides an accurate service impact that can be used to support or trigger an action.  This tool does not exist.

However, there are sophisticated capacity tools able take performance data from multiple performance sources and optimize IT resource as a service (e.g. BMC’s BCO product). The best results are achieved when the data received from the supporting performance tools focuses specifically on the environment being provisioned/updated.  This enables services to be changed (e.g. orchestrated through a service governor) with greater accuracy (e.g. supporting service placement or making decisions on requested changes in context of impact). monitoring diagram2

The future of IT infrastructure monitoring includes processing specific information to make trusted decisions.    For example; cloud monitoring will have policy derived from the cloud blueprint (cloud service component architecture) with possible input from other sources (e.g. a service catalog to guide service levels) see diagram 2.  This will result in one set of policy aimed specifically at the IT components supporting the cloud services to both assess cloud health and provide the actionable information needed to make safe changes. This differs from the traditional monitoring I mentioned previously which collects data from everything and then tries to apply filters and rules to reduce content to provide a true view of IT health.

Focusing on an outcome by monitoring a specific set of components allows the capacity tool to provide accurate placement decisions that can be executed through the governor and the provisioning and configuration tools.

Automated cloud decision making is just one example of the way monitoring is evolving. The same value could be attributed to any IT infrastructure automation initiative including agile development practices (e.g. DevOps).

Today we are at a crossroads. IT operations tools developed to monitor IT infrastructure health are increasingly being considered to provide highly accurate information to support automated decision-making . Even though this re-purposing can be achieved,  the effort, cost and complexity is going to be prohibitive and reminds me of a line from an old Irish joke, “sir to get there, I wouldn’t have started off from here”.

It’s not as if a totally new set of tools is required, although for the cloud IT decisions may be provided from monitoring embedded in cloud management solutions.

monitoring xls

Diagram 3

Diagram 3 describes 5 areas of differentiation between tools used for monitoring health and one’s designed/used for aiding decisions.  The most important differentiation being; the objective. This dictates policy, the environment monitored and the integrations required.  If you want to make decisions you set policy based on the decision being made, if you want to check for infrastructure health you set policy based on component thresholds.

Just when you thought monitoring was already complex. It’s about to get more interesting.

do IT users care about the datacenter?

datacenterThe datacenter – the IT business hub. Or it used to be.

End users could not care less about it. What they care about is applications availability and response times and the ability to get IT access from whatever device they choose and from wherever they want. The business increasingly makes decisions on what applications are used, where the applications are sourced and who supports them. There’s no love affair between the business and the organization called IT operations because it’s not about technology it’s about getting the job done. It’s not that datacenter availability isn’t important it’s just not important to users – the business measures IT value against the quality of support and applications availability and performance not servers, storage and networks.

Some will struggle with how monitoring the datacenter does not equate to understanding and measuring business availability. It was not so long ago companies providing datacenter outsourcing services would have a huge display in reception with topology maps showing a red, green, yellow status of the datacenter infrastructure. I can only assume it was designed to show control and understanding because I’d argue the computer room could spontaneously combust and no-one would be any the wiser until the end-users reported problems accessing their applications.

How many times have you thought “I wonder if the servers are performing well today?” or  ”I hope my files are backed up and secure”.  What you probably think is “email is slow, IT need to fix it now” and if data is lost or corrupted “IT had better get it back now”.  My point is this, the datacenter will continue to be critical to the IT organization responsible for managing it – not to the businesses that use it. For the business it’s all about the application – no matter where it resides or who manages it and the fact an application requires hardware and software to live is, from the user perspective, is irrelevent. It’s assumed.

In late December 2012 Netflix had issues. The fact it was over a holiday period made the problem even more annoying. It was a Netflix problem and twitter lit up with customer feedback for Netflix. Netflix blamed the issue on Amazon Web Services servers and said Amazon was addressing it. So, that’s ok then? It’s not a Netflix application problem – it’s a an Amazon server problem.  It doesn’t matter if Amazon’s servers were the real problem, it is Netflix’s job to make sure their applications are not plagued by a weakness in server capacity, performance, architecture or design – no matter who they decided to source this critical task to. Subscribers to Netflix do not pay Amazon.

It’s the same for any IT organization delivering IT application services whether they are internal or external to a business. Monitoring the datacenter to identify and solve issues is one thing – using the same element monitoring to try and demonstrate value to the business is another.  Managing the datacenter is mandatory, however using element based availability metrics as proof of IT business value and application availability is no longer acceptable.

From a business perspective the value of IT is assessed through the lens of their business users – not the datacenter.  This will increasingly result in IT value being assessed from the end user to the application source measured against services levels which means datacenter components can go up and down all they like as long as it doesn’t have a detrimental affect on business service levels.  With the growing trend to use applications from cloud based service providers who can tell where all the parts of an application are?  Netflix is hardly unique, architecturally, in the way it provides services. As more applications are made available in the cloud the location of the supporting infrastructure is likely to be in the hands of one or more additional cloud service providers.

So, who cares about the datacenter? The people responsible for managing it, developers, testers and business unit personnel who pay for capacity. For users and the business  - it’s all about the application.

service intelligence transforms the service desk

faceFor decades IT has struggled to understand how end-users use IT.  The only point-of-reference being the service desk which is the only place where the user community interfaces regularly with IT. However, it’s not easy to provide a view on end user IT value when all you have as reference are issues.

So, you call the service desk and you get the standard interrogation, a number of questions to help identify your issue and send it, with a degree of accuracy, to the right support team. Even though updates have been made to service desks for decades the core capability, managing problems and incidents, remains the same (this situation is made clear by Chris Dancy and his example of  ‘Form Based Work Flow’).  Irrespective of how functionally rich the service desk you use is tickets opened and problem resolution metrics are still used to show how effective IT supports the business. This ‘suck less’ metric is not good. The problem with problems is problems are not the same. And that’s the problem.

Things that prevent people doing their job are typically reported (e.g. connectivity or passwords) but things that are not show-stoppers and just annoying  (e.g. a sporadic performance problems, jammed printers etc) are not. For many it’s just way too much hassle. The reality is, end users suffer from poor application performance more often than any service desk log shows. The user will talk with their colleagues to make sure they are not the only victim and possibly just wait  because it’s easier to assume IT operations knows about it or just wait for the problem to fix itself (e.g. less user traffic, moving to a different location or using a different device).

There is no place to go to understand the overall end user experience leaving IT operations to make the assumption that if there are no major issues then the user must be fine. The problem using the service desk as a way to deter user satisfaction is it’s not a monitoring system. It simply logs incidents and manages them in line with established escalation and outage procedures. The use of infrastructure monitoring tools provide a view of the health of one datacenter (or one component type) and the use of most APM tools provides a partial view of the end user applications performance. There have been attempts to provide end-user visibility to the service desk to create a more intelligent, business aware, solution.  The attempts include providing self-help options, end user keystroke logging and control over windows end-point devices (primarily windows). However, even with some of these capabilities being offered the service desk remains a reactive incident management solution focused on supporting issues already impacting the end-user.

As the end-user environment becomes more complex (agile application releases, cloud based apps, BYOD, increased mobility etc) the ability for service managers to support the business will become harder and the use of internal datacenter performance metrics alone will not be relevant in a world where the IT user is using applications disparate sources on a multitude of different devices. Service managers must be able to understand both what the business uses and how the business uses IT. The ability to understand end-user behavior will move the service desk from a passive incident reporting system into a solution that provides the IT support organization with visibility into how the business uses IT.  This visibility will enable service managers to manage incidents more effectively and identify business trends which will impact the IT services provided to the business. Understanding how the business uses IT should enable service managers to plan accordingly in regards to how the support organization is staffed to provide service quality.

If you are not looking you will not find it

IT operations remains a reactive practice, hoping that technology will make them more proactive. The truth is if IT operations is not focused on being proactive then it will remain in a reactive state no matter what tools are used.  The same can be said for the service desk. For it to become a service intelligence solution also requires a change in how the service managers use it. Products that provide visibility into how IT is used also requires the service managers to take an active role in looking for trends that indicate something abnormal is occurring (e.g. people using an application on a specific device dealing with poor performance).

The path to intelligence

Service desks have yet to evolve to the intelligent solution I’ve talked about however, forward thinking IT organizations are already starting to think this way. It requires traditional organizational barriers to come down between the service desk and IT operations. A high-bred role is created that uses APM tools (primarily end user focused products – EUAM) to look for potential issues. The information is then passed to the service desk – automatically or manually through the opening of a ticket and a dashboard at the service desk showing specific performance trends as they pertain to applications and end users. Even though the service desk and APM tools remain separate today using them together should provide benefits – once collaboration has been established between service managers and IT operations.

 The value

  • End-user experience is tracked against service levels with tickets opened proactively when an end-user (or end-user group) experiences a degradation in service
  • The service desk understands the current end-user experience, the devices being used, their location, their normal activity and the applications being used providing greater visibility into how the business is using IT.
  • The service desk is made aware of the end-user experience no matter where the applications are sourced (locally, internal or external). This enables accurate incident ticket assignment.
  • End-user activity is available for ‘play-back’ to help understand and identify what was being done at the time an issue occurred enabling effective root-cause analysis.

is there IT guidance without bias?

In the post titled community vs. the analysts I wrote about how I believed IT organizations use social and analyst content.  It’s relatively easy to explain why companies use content from both these sources, however when looking for guidance or answers to IT business questions is there anywhere where an unbiased advice can be found?

Everyone has a list of favorite and disliked vendors and products. A bad experience with a product can taint a vendors reputation and that of their entire portfolio. However, in the world of enterprise computing a bias doesn’t have to be related to a product and can be created because of poor support, poor service or a bad sales engagement.  As an analyst it was common to hear something I’d recommend came under attack just because the client had a historical problem with one of the product or vendors options given.  So IT bias can be towards anything, hardware, software, a vendor, a product, an approach, an organization, a best practice, or even a set standards. So where do you go if you are seeking advice with no bias? The obvious answer is the analyst community but is it possible to be truly unbiased?

opsleuth - bias

There’s no such thing as unbiased IT opinion. 

Analyst companies claim to provide opinion with no bias, social communities opinion is fueled by bias and vendors have an obvious bias. All have bias it’s just that some is out in the open and some is hidden. Vendors and social communities do little to disguise bias whereas analysts do everything they can to hide it.

Analyst bias can take many forms. Analyst firms are not equal – there are very large ‘tier 1′ companies and there are ‘tier 2′ or ’boutique’/'specialist’ companies. For IT operations management tier2 analyst firms their revenue primarily comes from vendors. Beyond basic research services this can take many forms, including paid for product endorsements or sponsoring primary research. This shows blatant bias and few would consider this type of content as more than just interesting.

Tier 1 analyst firms deny any form of bias and as corporations may not overly exhibit any however, they need to demonstrate a complete grasp of the market and this means establishing thought leadership to help to mold how markets are viewed and addressed. This typically results in the creation of best practices, methodologies, terminology and models.  Any vendor wishing to be taken seriously and be positioned correctly must conform to how the analyst company defines and articulates the market.  If they don’t then they risk being described and positioned in a way that may not be to their liking and this will emerge in research, presentations and client engagements.

Bias is normal and should be expected so when it comes to evaluating and understanding content the bias must be factored into your thinking. Vendor content is designed to show their products in the best light, social community bias is driven by the content providers which explains the diversity and analyst company research contains bias based on their belief systems and the individual analyst experience.  As a result analyst research should be taken at face value with the assumption that it’s based on a neutral standpoint supported by facts. Analysts are tweeting and starting to emerge in social communities unshackled from their editorial processes and less policed and protected by their logo.  So when evaluate analyst content with that found in social communities and vendor web sites then I would recommend the following;

1. Understand how each analyst company defines the market and positions the vendors (even if you don’t agree)

2. Read multiple reports written by the analyst to understand their position on a number of related research papers (look for themes/consistencies)

3. Be aware of the analysts history (the author’s resume)

4. Search for the analyst comments in social media (e.g. tweets, facebook, linkedin, blogs)

5. Compare the analyst research with competing analyst firm research to get a different perspective

crowd sourcing and the self-sufficient digital native

Screen shot 2013-01-08 at 1.40.27 PMIT savvy is no longer the exclusive domain of the IT organization. IT plays a pivotal role in many end users day-to-day activity and is as natural as breathing in and out.  Digital natives entering the market has led to the creation of a new type of IT user, one where self-sufficiency has become a way of life. Social IT activity rarely includes a support organization ready to leap to your aid in times of trouble instead the user relies on support found from using search engines, blogs, on-line documentation and through social collaboration.  When the IT savvy digital native enters the job market their ability to deal with (or at least attempt to deal with) IT issues (e.g. connectivity, access or file sharing) is significantly greater when compared with people entering the market only a decade ago.

Working with digital natives I find them to be more self-sufficient and believe IT problems can be solved faster if they are given the ability to do it themselves.  This emerging environment ‘should’ create changes into how users are enabled and supported. For example; if the end user is more self sufficient then service management tools should provide a lot more than a hotline support number. Service management tools could be enhanced with self help, intelligent search, automated recovery and importantly, crowd sourced information. Crowd sourced information could allow users to understand how IT is being experienced by their colleagues while also helping the IT support organization understand end user experience and aid root cause analysis.  This capability is especially important with applications sourced from diverse locations and the prolific use of mobile devices.  The reality is; a service desk has no clue where you are and what you are doing so when problems occur it’s just the beginning. The only view of IT service and what’s really being experienced can be attained from the end user and increasingly, by the end user. All this information is collected, analyzed and delivered without a single communication with a datacenter.

Crowd sourced application experience data provides a far greater understanding of overall end user status easier and with far less complexity, cost and effort than any traditional datacenter centric IT management tool. Of course, it does not give the deep-dive information many APM tools provide but in this case it’s not just about application availability (e.g. downrightnow.com) it’s about helping the end user become more productive and self sufficient.

This is not something found in IT operations management today, however the concept has been used in other types of applications (e.g. waze and GPS navigation) where crowd-sourced data provides work-arounds and options. For road navigation it could advise taking an alternative route due to an accident, for IT it could be to use an alternative printer or avoid using a mobile device in an area where performance is being impacted.

What I have described is a future state so for the time-being digital natives are going to continue to find ways to support themselves – no matter if it’s for their own personal needs or those provided by their employers.