Why Recurrent Problems Persist: Getting to the Root Cause

If recurrent problems are occurring at your plant, it could be an organizational issue rather than an equipment design deficiency. There are a number of reasons that root causes are overlooked. Understanding and avoiding the barriers to real problem-solving can help save time, money, and frustration.

Most people see recurrent equipment or process problems as purely equipment or process problems. The real problem is often missed. If an organization has recurrent equipment or process problems, it’s an organizational problem and not an equipment or process problem at all. The recurrent problem is an indictment of the organization. The organizational problem is the problem behind the problem. The organization lacks the will, structure, vision, or ability to solve the problem.

What is a problem? For the purposes of this article, it’s something that is destroying value. What is a recurrent problem? It is a problem that keeps occurring on a too-frequent basis.

If a problem happens once and there is a reason to suspect that it might happen again, decisive action should be taken to prevent recurrence. If a problem happens twice, then the organization missed the opportunity to avoid it. If corrective action didn’t occur, there is an organizational problem that is enabling, and indeed creating, the recurrent problem.

Recurrent problems destroy value and cost greatly over time (Figure 1). As with an annuity or a bond, there is a present value to that stream of payments. That present value can be enormous. It is often orders of magnitude higher than the single flare-up of the problem, and there is no benefit in return. When there is a hidden and unrecognized drip-drip-drip of problems, the enormity of the total cost isn’t grasped.

1. Recurrent problems are recurrent costs. The present value of recurrent problems can be enormous, but often go unnoticed as a steady drip-drip-drip loss. Source: POWER 

Ideal Versus Real Organizations

As a thought exercise, consider an idealized organization—one with a single perfect employee who is also the owner. In this perfect scenario, the single employee/owner performs all roles, is tireless, has unlimited bandwidth and ability, and has all latitude with respect to decisions and approvals. The individual has access to all information. There is no miscommunication or lack of communication. The employee/owner doesn’t care about overtime or recognition. The person doesn’t posture or politic. He/she doesn’t care about optics or spin, and isn’t worried about turf, jurisdiction, or stepping on toes. The individual isn’t concerned with blame or credit, and the person’s time horizon is long.

In this idealized organization do recurrent problems exist? How could they? The employee/owner in this case always recognizes that problems destroy value and must not be allowed to continue. The person figures out a way to eliminate it, buys what is necessary, and does whatever is required to remedy the situation. The would-be recurrent problem is resolved before it is a recurrent problem.

Real organizations are very different. Organizations are made up of people, and with that come both positive and negative attributes. People do peculiar and value-destroying things when interacting with other people. People add complications. People have agendas, motivations, and limitations that don’t always align with the best interests of the organization. From that flow financial controls, engineering controls, silos, political turf, unions, management, overtime, short-term thinking, personalities, feelings, rivalries, cronyism, incompetence, narrow points of view, optics, spin, and fear. Those are but a few of the complications and contributors that can cause recurrent problems to emerge, and then persist, in real organizations.

Which is not to say that budget controls, engineering controls, management, and unions are all bad and aren’t needed, but rather that there are trade-offs and downsides that must be recognized and effectively countered. Leaders must ensure organizational controls and structure don’t stifle innovation, vision, and action.

Decision-Making Challenges

Because real organizations rarely contain all of the ideal elements, people face a number of difficulties when trying to solve problems.

Blind Spots. Esteemed psychologists Amos Tversky and Daniel Kahneman conducted groundbreaking work in the area of judgment and decision-making under uncertainty. Essentially they identified a number of glitches in human thinking—systematic cognitive biases or blind spots—including the bias whereby peoples’ thinking tends to be dominated by recent events or those easiest to recall.

Looking at more-abstract and less-immediate trends and concepts tends to be difficult. People neglect to extend the current episode and the last one into projections of future occurrences. The natural focus tends to be weighted toward what just happened and not on longer-term trends. The recurring problem finds fertile ground in the blind spots and biases of human thinking.

Sunk Costs.When spending money in a more pre-planned way, there are very defined and formal approval processes and controls. For example, a supervisor might be authorized to approve up to $5,000; a maintenance manager may be able to approve up to $50,000; a plant manager might have a $250,000 limit; and a vice president could perhaps authorize $1,000,000.

Often, no such approval process exists for spending money on recurrent problems. The lack of a process around the failure-correction spend makes it seem less like real money. For one thing, it is done, sunk, baked in the cake, and there is nothing to approve or not approve. The failure must be fixed and it costs what it costs. It’s in the rearview mirror. It’s irreversible. Spilt milk. For another thing, the purchase order for the failed component might be less than 1% of the cost of the lost production. The lost production doesn’t show on the purchase order to fix the failure. The costs of the future and previous occurrences also do not show on the purchase order for the current flare-up. The purchase order may be less than $10,000 for what is a million-dollar recurring problem.

It is a deceptive point of view. After all, the money is just as real as any other outlay and it’s an expenditure and loss that wasn’t approved, planned, or budgeted. Perhaps the same hardwired criteria should apply to investigating recurring problems as is applied to approving traditional planned spending. Perhaps the vice president should personally oversee the investigation of the large recurrent problem, the plant manager the medium, and so on.

Phases of Death.In the beginning there is always an excess of bandwidth (available man-hours). In the early life of a plant or organization there is a tendency to over-staff initially. This is for a variety of reasons. Some newly hired will quit. Some will wash out. Most repairs are still being handled under warranty. Field representatives, trainers, and consultants are augmented into the staff. The plant is still in the honeymoon period of new equipment that has yet to have any wear and tear on it. All of these things result in more manpower than is needed in the initial phase of an organization’s life.

What happens to all that slack bandwidth? According to “Parkinson’s Law,” work will tend to expand to fill up the time made available to do the work. In other words, waste, inefficiencies, and unneeded practices will grow to fill the space. A recurrent problem can easily take root in this environment and no one will resist it or make note of it because there is plenty of slack capacity to absorb it. In fact, the staff wants to be busy and so they will welcome the work. It is ushered in and goes unquestioned.

The next phase is “normalization of deviance.” What might have seemed out of sorts (a deviation) in the beginning becomes normalized after a period of time. Phrases like “It’s always been that way”; “That bearing just tends to fail more than others”; or “The vibration has always run high on that pump” are heard.

Finally, all the slack is consumed and there isn’t any time to question why a specific component fails so often. At this point, the folks who have the point of view and desire to resolve the recurrent problem don’t have the time to invest in the endeavor (Figure 2).

2. Death spiral. Organizations evolve over time. In the beginning, they are often overstaffed. Work fills the void, problems become normalized, and eventually little time is available to resolve underlying issues. Artwork: POWER / Source: Gene Grindle, PE

Visibility, Optics, and Communications. Many times what gets reported up the chain of command is sanitized. Sometimes what gets reported is downright fiction. Each transfer of information is done in a way that makes the one reporting, and the person’s team, look as good as possible.

Recurrent problems can go unresolved because communications within the organization make them unseen by the bulk of the organization. I once resolved a decades-long recurrent problem that created an enormous net present value (cost avoidance) for the power plant where I worked. There was no awards ceremony. I didn’t receive a piece of the action in the form of a finders fee. There was nothing.

Why? The solution didn’t register as added value. Management hadn’t noticed the “drip-drip-drip loss” in the first place. The same organizational defect that had allowed the recurring problem to develop made the group unable to comprehend and appreciate its elimination.

The Default Narrative. Sometimes conventional wisdom or a leader will quickly chalk up a failure to a default go-to explanation. In the first plant where I worked, the plant manager would immediately blame “lack of lubrication” when a bearing failed. Being new, I assumed he was always right.

My planner and I eventually discovered, however, that some bearings never failed and some failed like clockwork. One bearing, 20A conveyor take-up bearing, failed every 60 to 90 days. This happened even though the same crew lubricated all bearings on the same schedule. And it happened when on the adjacent sister unit the bearing in the identical application had not failed in 20 years. Still the plant manager and plant staff always chalked it up to lack of lubrication. No one ever questioned it.

My planner and I stood in the cold wind of an ice storm at 11 p.m. one February night watching as the maintenance crew once again was at work changing out the 20A take-up bearing. Suddenly the millwright began to shout and curse. “What happened?” I asked. “I dropped the wedge washers,” he said. “Wedge washers?” I asked. “Yeah, this take-up assembly isn’t aligned squarely, so there are wedge washers under the pillow block to square it up,” he explained.

After further research we discovered that the take-up assembly was badly twisted, and therefore, the bearings (wedge washers or not) were always in a severe mechanical bind. David, my planner, in five short minutes of research found a hybrid bearing that allowed a small amount of swivel and up to a 10-degree misalignment. The cost was the same as the old bearing assembly. We made that our new standard, and after the new bearings were installed, we never had another take-up bearing failure in all the years I worked there.

Over the decades, that recurrent bearing failure had cost the plant millions of dollars (the 900-MW unit was dropped to half load for 10 to 12 hours every time the bearing failed). The fix was simple and cost nothing extra.

The equipment problem was a symptom of an organizational problem. The organizational problem in that case was a lack of inquiry and analysis caused by the dismissive default explanation.

Validation. In most organizations, the majority of folks want to do a good job. What they consider to be a “good job,” however, is tied to what the organization frames as being a “good job.” We get more of what we validate.

If, however, organizations mostly validate and celebrate epic effort, for example, of a work crew that put in an 18-hour workday to fix a flare-up of a recurrent problem, then they may be asking for the recurrent problem to persist. The group has unconsciously defined the act of completing the repair as being “a good job.”

Certainly, all good work and effort should be appreciated, but caution must be taken not to inadvertently telegraph the wrong message. Most of the celebration and validation should be focused on the long-term elimination of the problem. The repair and restoration should mostly be viewed as a cost, a regrettable loss.

The message should not be that we like it and want more of it, but that we want to solve the problem and avoid future repairs. As management guru Peter Drucker pointed out, “There is surely nothing quite so useless as doing with great efficiency what should not be done at all.”

Motivation. Each person in an organization has different motivations, different objectives, and different things they are trying to accomplish or not accomplish. Workers may have ambitions they are trying to fulfill or they may just be trying to stay under the radar.

Take a tradesman such as a welder as an example. He is highly trained and presumably motivated to do what he chose to do as his profession. When a welder is welding he is doing what he likes to do. When a problem arises and he is called upon to make a weld repair, he is never as valuable to the organization as he is in that moment.

He then gets validation when the boss comes by and mentions his great welding, thanks him for working through his break, and for staying late. When he cashes his paycheck, he has substantially more money due to the overtime the recurring problem created. What motivation does the welder have to prevent such a repair? He loses overtime. His value is diminished. That’s not to say that the welder is actively undermining the plant, as he most likely isn’t, but that everything is pushing and nudging him to be an expert at fixing the flare-up of the recurrent problem. There is little incentive to eliminate it.


Of all the organizations of which I have been a part, the ones with more incentives tied to bottom-line results tend to have less recurring problems. Organizations must focus incentives, attention, talent, validation, and resources on eliminating problems, particularly recurrent problems. Create a system where everyone has skin in the game that aligns with the organizational needs. Base everyone’s dominant incentives on eliminating problems rather than on addressing consequences of problems.

People notice events and tend to not notice the absence of events. Events automatically pull our attention and awareness. Instead, managers must give praise and validation to the absence of problems. Resist the human tendency to mostly focus on the real-time shiny object rather than the long-term extension and costs of the recurrent problem. Know your own blind spots and tendencies, and compensate in the other direction.

Where engineering, financial, and other controls are necessary, organizations must ensure that the controls don’t stifle innovation and destroy value. The controls must be no more than the minimum needed. Anything more destroys value.

Make certain there remain resources that won’t be swallowed by the crisis du jour. The best resources must not be taken out of the game for constant “firefighting.” At times, it makes sense to go outside the organization to bring in focused resources that aren’t “in the arena.” Outside resources benefit from “cold eyes” that haven’t yet accepted the normalization of deviance. Outsiders are not infected with the organizational problems.

Defend and preserve your slack bandwidth. Once one is buried in full-time crisis there is no chance to be strategic or innovative.

Finally, when a recurrent problem is found, address it, but then hold the organization up to the light and ask, “What is it about my organization that allowed the problem to take root and persist in the first place?” ■

Gene Grindle, PE (gene.grindle@gmail.com) is a mechanical engineer with more than 25 years of experience in the power and energy sector.