Maintenance & Reliability November 21, 2016

Effective Root Cause Analysis Means Accepting We Could Be Part of the Problem

By: Bob Latino

Author: Robert J. Latino, CEO, Reliability Center, Inc.

In Figure 1 above, this is a summary graphic showing what I call ‘the germination of failure’.

Germination of a ‘Failure’

No matter where we work, we will experience failures or ‘undesirable outcomes’ of some kind. As long as we work with other humans, this will indeed be the case. These failures may surface in the form of production delays, injuries, customer complaints, missed deadlines, lost profits, legal claims and the like.

In order to prevent recurrence of any such undesirable outcome, we have to truly understand the causes that led up to that bad outcome. In many of our worlds, the process used to analyze and understand what went wrong is called Root Cause Analysis or RCA. However, for the sake of this article, call this process whatever you want; problem solving, brainstorming, troubleshooting, etc. The common denominator of these terms, is they desire to resolve a failure and ensure it does not happen again.

Let’s get away from labels and specific industries and focus on the anatomy of a ‘failure’. Where does a failure come from? Think about this no matter where you work and see if it applies.

The ‘Root’ System

The seeds of this germination are what we will call our management (or organizational) systems. These are the rules and guidelines in which our organizations operate. Much like the laws of our lands that govern how our countries operate. Since these are created and maintained by humans, they are not flawless. They can be insufficient, inadequate, wrong and even non-existent (for situations unforeseen). We refer to these management system flaws as Latent Root Causes. This is because they are always there, lying dormant, and waiting to be activated by the human.

When our management systems are flawed in some fashion, we feed incomplete information to people who must process the information to make their decisions. Ultimately, this will likely result in a bad decision! We refer to these ‘decision errors’ as Human Root Causes.

When humans make a wrong decision, it is expressed in one of two ways, 1) errors of commission or 2) errors of omission. This means we took an action that was inappropriate (error of commission) or we should have taken an appropriate action and didn’t (error of omission).

Examples are endless to describe these two situations, but an error of commission may be that we closed a valve in a manufacturing operation that we should have left open. An error of omission may be that an ER nurse improperly triaged a patient, and as a result, they died waiting for care in the waiting room.

When humans make decision errors, they often result in observable consequences. At this point, the error chain has not been obvious because it is still in the mind of the decision-maker. Only after the decision is made, are the consequences observable. We will refer to these consequences as Physical Root Causes.

Let’s follow through with our examples used earlier. A manufacturing plant operator turns off a valve that cuts off water flow that would have cooled an overheated process. As a result, the overheated process causes an unexpected interruption that automatically shuts down the entire operation.

In the emergency room of the hospital, the improperly triaged patient flat lines and a Code Blue is called, forcing a rapid response team to tend to the patient. The patient had an underlying condition that was not detected during the initial triage assessment, and as a result, the patient had a stroke and passed away.

In both of these scenarios, after the decisions were made, the consequences of the decisions became apparent.

RCA Effectiveness – Facing reality that we could be part of the problem

Now that we understand where a failure comes from and how the error chain grows, how can we make our RCA processes more effective? Why do we often seem to be doing RCA on the same events, over and over again? Are we not learning from the past? Is it that our Root Cause Analyses just aren’t that good?

Having been an RCA practitioner now for over 30 years working in various industry sectors, my observation is that we have a difficult time looking in the mirror and accepting that we could be part of the problem!

Many organizations seem content with their RCA processes, when their analyses pass some kind of regulatory audit. This means the regulators are off their backs.

However, that is not the true measure of RCA effectiveness and it is misleading. RCA effectiveness should be measured based on quantifiable and meaningful bottom-line metrics that correlate to corporate dashboards or KPIs.

In our hospital scenario, just because we passed an RCA audit or survey, is the patient any safer? Almost all 6,000 hospitals in the U.S. are accredited, yet the deaths due to medical error continue to rise (to the point that medical error is the 3^rd leading killer of Americans today at over 1,000 deaths/day (source: http://www.healthcareitnews.com/news/deaths-by-medical-mistakes-hit-records).

The key to RCA effectiveness is facing the truth. Unfortunately, we are not very good at accepting the truth when it involves ourselves. The ‘truth’ is embedded in the management systems we spoke about earlier. Oftentimes we focus on the decision-makers and then levy discipline for making a poor decision. However, RCA is not about ‘who’ made the poor decision. We are more interested in why the person felt his or her decision was appropriate at the time. This is what RCA is all about!

When we get into decision-makers’ heads and understand their reasoning for their decisions, most of the time their rationales are perfectly logical. Their decisions are most often well-intended. And more importantly, others would likely make the same decision given the same information.

When we delve this deep, this will bring us right back to the flawed management systems that provide these people such information. These systems are supposed to be in place to help our people make better decisions. So when they are flawed, our systems are at risk of not performing as intended.

Let’s reflect back on our hospital scenario described earlier. A patient comes to their local ER and is assessed by the nurse, PA or MD. Those conducting the triage certainly did not intend for the patient to be harmed while waiting for care. So what could have led them to believe that this particular patient could wait, relative to the acuity of the other patients in the ER? Here are just a couple of possibilities:

Inexperienced person conducting triage.
ER overloaded and staff was time-pressured and understaffed.

From a management system standpoint, if the above existed, we would have to drill deeper and understand the systems that permitted those conditions to exist.

Why would we have an inexperienced person conducting triage in the ER?
1. Person scheduled to do triage was unavailable due to another emergency that pulled them away (either at the hospital or a family emergency) so they pulled someone from another department that was available.
2. Person was a new hire and new to the position.
Why would we be understaffed when the ER was overloaded?
1. We did not anticipate the overload.
2. We did not have a plan in place to activate under such conditions.
3. We had a plan in place to handle the overload but we did not follow it.
4. We had a plan in place to handle the overload and followed it, but it was obsolete. It had not been updated since the addition of new technologies and the expansion of the ER.

Certainly this is not a comprehensive listing, but it makes the point. This is where a mirror comes into play.

What if we were the person who:

Allowed the inexperienced triage person to work in that capacity, because things were hectic and confusing at the time?
Did not update the procedure for handling an overload condition, when the ER was updated and expanded?
Did not follow the procedure for an overload condition?
Trained the person conducting the triage and they were not ready yet?

These are the sensitive issues that a true RCA would seek to understand and uncover. This is the hard part of RCA, uncovering the truth. This is where most RCA’s lack depth and people prefer not to deal with these sensitive but absolutely necessary issues.

Think about it, if we choose to ignore these deeper issues (because it is easier and more comfortable to do so), then the ‘seeds’ of failure are still implanted in our systems. This just means they will be activated by someone else at a later time and the patient or operation will risk peril once again.

For RCA’s to be truly effective, we have to look in the mirror and face the possibility that we could have unintentionally contributed to the bad outcome…that is the only way we will make progress. This type of openness and non-punitive environment is a key principle of a High Reliability Organization (HRO).

Remember, “We NEVER seem to have the time and budget to do things right, but we ALWAYS seem to have the time and budget to do them again!”

For an abbreviated example of this root system related to a hand injury, view a short video case study.

TRY A FREE DEMO of PROACTOnDemand® RCA Management System.

About the Author

Robert J. Latino is CEO of Reliability Center, Inc. Mr. Latino and been a practitioner, trainer, author and international speaker on the topics of Reliability and Root Cause Analysis for over 30 years. He can be contacted at 800/457-0645 or blatino@reliability.com. Visit our website at www.reliability.com to learn more.

Effective Root Cause Analysis Means Accepting We Could Be Part of the Problem

Effective Root Cause Analysis Means Accepting We Could Be Part of the Problem

Related Articles

Future-Proofing Industrial Operations: How Equipment Strategies Are Driving Sustainability Success

Mining Maintenance – Hydro-Cyclone Monitoring to Prevent Sanding and Blockages

Lubricant Additives: A Comprehensive Guide

Related Whitepapers

Pump Maintenance Mistakes Checklist

Selecting the Right Bearings to Improve Vertical Turbine Pump Reliability

Torque Measurement Precision: Why the Overload vs. Overrange Distinction Matters for Quality Control

Implementing Predictive and Prescriptive Digital Maintenance Technologies for Rotating Equipment