Productive maintenance turnarounds are vital to long, successful online periods. Getting the work done quickly and properly takes more than just luck. Good planning and excellent communication can lead to safe and efficient outages with fewer hiccups and less stress. 

It’s no secret that equipment requires maintenance. Just as your personal vehicle needs an oil change periodically, a power plant needs to shut down regularly for a tune-up to take care of all those little issues that would otherwise turn into big problems.

Most generating companies are preparing for outages continuously. When one outage ends, plans begin for the next. Often planning is in progress for major projects years into the future (Figure 1). Work list items that could not be completed for one reason or another are added to the next outage schedule. The entire process is a full-time job, often with entire departments focused on the task.

1. Major projects require extensive coordination. An air quality control system upgrade can take years and cost hundreds of millions of dollars. Planning the outage to tie it all in is no small feat. Courtesy: Aaron Larson

But what makes some maintenance periods go smoothly—coming in on budget and on time—while other outages hemorrhage with cost overruns, unexpected delays, and continual surprises? It is very difficult to foresee every possible problem, but planning for contingencies and scheduling appropriately are two vital tasks required to get all involved parties on the same page.

Planning for Success

In a career spanning more than 30 years, Guy Starr, president of DZ Atlantic, a Day & Zimmermann (D&Z) company, has been involved in hundreds of outages at nuclear and fossil-fueled power plants, as well as process and industrial plants throughout the U.S. He has seen what works and what doesn’t.

“You must have a good scope—know what you’re going to do—you must have a good plan, and a good team. That’s really what makes for a successful project,” Starr said. “Now that sounds really simple, but there are a lot of activities that must go on to make that happen.”

If you’ve been involved in many outages, you’ve probably experienced maintenance periods that were run more like a mishmash of disconnected activities. Perhaps multiple contractors were brought in, each with different goals and schedules, none of which were integrated into an overall timetable. Managers who weren’t used to the process or in touch with the big picture may have directed the work, which resulted in activity conflicts and ultimately outage delays.

A lot has changed as utilities have come to realize how important a well-run outage is to the company’s bottom line. Hundreds of thousands—or even millions—of dollars hang in the balance based on outage durations. With many best practices already implemented, outage schedules have been condensed dramatically over the years. According to the Electric Utility Cost Group, nuclear refueling outage durations averaged 105 days in 1990–1991, but that has been cut to under 40 days for a typical refueling outage today. Although plants continue to place an emphasis on shortening outage durations, companies must also maintain high quality standards and conduct the work safely. Although not all of the time savings can be attributed to improved outage management, Starr believes better execution is a direct result of better planning.

“The most successful outages have the proper planning in place,” said Starr. “Over time the clients have seen that the utility owner and the contractor need to be one team to incorporate for proper outage readiness. We have to be a partner in order to get this accomplished.”

There are other factors, such as plant size and location that can affect outage preparations. For plants located in remote rural areas, finding qualified people can be problematic. Mobilizing a team of workers can take time and can be costly. Early planning is essential to ensure necessary personnel are available.

Plant size also affects staffing and preparation for outages. Small plants rarely have excess people who can focus solely on outage preparations. The tasks of planning, scheduling, and executing the work all fall on a tiny group of core personnel who run day-to-day operations. In some ways, however, having that responsibility and authority makes managing outages easier because it minimizes the red tape that can hog-tie managers at larger plants.

New employees and contractors require special consideration. Outage periods are intense and require additional focus because of all the activities that take place concurrently. If any staff have never been involved in an outage before, it is particularly important to have a well-designed indoctrination program to get them up to speed prior to throwing them to the wolves. Even things as simple as a person’s route from the parking lot can change due to heavy traffic patterns around a facility. Making the effort to ensure everyone understands safety policies and work procedures is time well spent.

Setting the Scope

As Starr pointed out, having a good scope is at the top of the list for running an effective outage. He noted that utilities have learned to freeze scopes early in the process. Once the scope is frozen, the contractor can begin estimating and scheduling activities. Setting early milestones for work package development allows proper safety steps to be incorporated and ensures good instructions are provided. Late planning, scope creep, and unplanned emergent work are a sure recipe for failure.

“Having that core planning and execution team on site early—fixing and freezing the scope as early as possible—is what we have seen that makes an outage successful,” said Starr.

Another key is having an integrated outage schedule. Many tasks require coordination not only between the plant and the contractor, but also among departments within the plant. Having an integrated schedule can help facilitate that coordination.

If a pump rebuild were required, for example, the timing of operators locking out the equipment, technicians removing instrumentation, electricians unwiring the motor, carpenters erecting a scaffold, mechanics uncoupling and disassembling components, and perhaps a contractor repairing internal defects on the housing could all be planned accordingly. Integrating that schedule has been paramount in condensing outage durations over the years.

Having a good team—motivated employees with the skill and desire to perform quality work in a safe and efficient manner—with a well-defined project organization and clear expectations is important. Not only are leadership abilities necessary, but good communication skills also are vital. For example, the managers making final decisions often don’t have intimate knowledge of what is necessary to complete a job in the field, but a picture showing physical obstructions in a work area can instantly explain why a job takes longer than one might otherwise have expected. Explaining the details and providing drawings and diagrams with specifics can lead to better, more confident decision-making.

Technology Tools Help

Technological advances have improved the work management process too. There are a number of software tools available today for basic project management and scheduling. Many computerized maintenance management systems have these functions built into their programs, but there are also off-the-shelf programs, such as Microsoft Project, that can be used. The technology allows easier tracking of tasks with scheduled start times, durations, and finish times.

Oracle’s Primavera solution is another option—reportedly used at 60% of the world’s top utilities—to manage outages and capital projects. The platform offers a single set of solutions for managing outages, daily maintenance, and capital expansion projects of all sizes. It also helps companies institutionalize best practices and methodologies to achieve repeatable success. When changes inevitably occur, the program enables managers to pinpoint potential delays and develop contingency plans.

Dashboards (Figure 2) offer a wealth of visual information, including Gantt charts, to manage time lines and allow critical path activities—the jobs and sequence that must be done in order for the outage to complete in the shortest amount of time possible with no change in scope—to be easily identified. Employee productivity can be increased through better planning and allocation of resources.

2. Oracle Primavera P6 project management software. The program can be used to organize projects with up to 100,000 activities. Graphical timelines provide managers with a visual depiction of outage progress. Courtesy: Oracle Corp.

In addition to scheduling software, D&Z utilizes estimating systems, historical databases for lessons learned and best practices, and its own internally developed outage readiness program called the Project Execution Tool (PET).

Pre-outage milestones are ranked in PET using color codes—green, yellow, or red—based on whether a job is ready to go or has issues that could delay or preclude completion. A yellow task may have parts on backorder that are being expedited by the supplier and are expected to arrive in time to conduct the activity on schedule. A job needing parts that haven’t yet been fabricated or have long lead times would likely be classified as red and may ultimately delay outage completion.

The PET is constantly being updated with the client. Meetings are held 90 days, 60 days, and 30 days prior to shutdown, with tasks appraised right up to the outage date to make sure that all items are ready to be executed. If something isn’t ready to be performed, D&Z places the task on a risk register, assigns it an opportunity value, and starts developing a contingency plan to minimize the risk.

The most important thing that PET does is force communication. People get together around a table and discuss the status of the scheduled jobs. The individuals who are responsible for the work must provide answers about what is being done to overcome obstacles. PET prompts the discussion, allowing members to understand their roles. Expectations can be set, and the process drives accountability. It allows the entire organization to respond when necessary and allows for more of a team effort, which leads to a successful outcome.

Trust but Verify

In the end though, the tools are only as good as the people using them. The old saying, “garbage in equals garbage out,” still applies. Having trust in your employees is obviously very important, but verifying their reports is imperative to keep systems updated and reports accurate. If incorrect information is relied upon, proper decisions cannot be made.

“When you’ve got your supervisors coming in and telling you what percent they’re complete, how many welds they’ve made or how many feet of pipe they’ve put in, that’s good and fine, but we’ve got our project controls group and they’re out there and they’re verifying. We trust, but we verify constantly to make sure the data that we’re getting in is good data. That we don’t just have pipe hanging in the rack with ropes versus actual hangers and welded together or screwed together,” said Starr.

When talking about outages, words that seem to come up time and time again are “expectations,” “responsibility,” and “accountability.” For the team to complete tasks on schedule, the expectations must be defined and the people responsible for meeting milestones must be identified on the team’s organizational chart. Ultimately, that process designates a single person with the authority to sign off on work for both the client and the contractor. Having that single point of contact drives proper communication, eliminates confusion, minimizes finger-pointing, and reduces delays.

Forced Outages Force Action

Of course, some outages pop up unexpectedly. A tube rupture can spoil everyone’s holiday plans and throw managers into crisis mode (Figure 3). But there are things that can be done to make even those situations go more smoothly. First and foremost is having an updated contingency plan and work list available for on-call personnel to reference.

3. And they’re off! As in a race, speed is important during a forced outage. When sootblower erosion brought this biomass-fired boiler offline suddenly, having a list of dependable contractors saved time and reduced stress on plant supervisors. Courtesy: Aaron Larson

Keeping a list of reliable external contacts that can help provide needed resources in an emergency is also important. The time to select your trusted partner is not when your back is against the wall following a tube leak, but well in advance of any unexpected outage. Developing trust and confidence with a contractor is important for getting a quick response, which is helpful in getting a unit back online in a minimal amount of time. Most plants have a number of go-to contractors on speed dial.

Managing the activities during a forced outage requires planning on the fly. Safety and quality must never be compromised, so managers need to take a step back and monitor the big picture. It is very easy to become caught up in the task of getting the work done and returning the unit to service, but following well-defined procedures can help maintain focus and prevent accidents.

Having a continually updated work list is another best practice. If a unit is forced offline, there are frequently additional maintenance activities that can be done in parallel with the repair that brought the plant down. For this reason, scheduling work and defining the critical path is just as important during a forced outage as it is for an outage that has been planned for months or years. Understanding the work progression allows sequenced actions to be prepared in advance and staged so that handoffs take place without a hitch.

If nondestructive testing is to be conducted during a forced outage, it should be done as soon as possible. Identifying issues early helps establish the scope for the shutdown and allows unexpected deficiencies to be caught sooner rather than later (Figure 4). Although it may not always be possible, getting early results from hydrostatic, ultrasonic, radiographic, or die penetrant tests can save a lot of time by allowing work items to be completed in parallel.

4. Early testing can identify previously undetected issues. This tube leak on a superheater U-bend was found while conducting a hydrostatic test during a forced outage for a completely different problem. Detecting problems promptly can allow repairs to be scheduled without extending outage durations. Courtesy: Aaron Larson

A common saying in maintenance organizations is that there is always time to do a job right the first time. Rarely is placing time pressure on workers going to result in the highest quality and fastest task completion. Supervisors are expected to get tasks done as expeditiously as possible, but they must walk a fine line, because placing too much emphasis on completing jobs quickly often doesn’t turn out well. Quality can suffer, and performing rework will never be the most efficient work process, not to mention that it is always a morale-killer. (Sometimes it is even a literal killer. See “Safety Is Not an Accident” in the April 2014 issue.)

Keeping track of the hours worked by essential employees is also imperative. Fatigue sets in very quickly during forced outages. Often, workers have been awake and may have worked an entire normal shift prior to being called back for emergency repairs. A proper plan can minimize wasted time, allowing workers to maximize their rest without unnecessarily prolonging the outage.

Most companies have policies restricting the number of hours that an employee can work without a rest period, but even those limitations can border on excessive. The quality of work that you can expect from an exhausted employee is much lower than normal. Additionally, some jobs can only be done by a select group of qualified individuals.

For example, if you only have one welder qualified to perform a certain weld that is required to complete a repair, you can’t afford to waste that person’s time because the job could be shut down when they have to take a rest break.

In nuclear outages, radiation dose must be tracked even more closely than time in some cases. Personnel with a skill set needed on a high-dose-rate job must have their total dose managed conservatively on other jobs to maximize their availability for the work that only they can do. Accurate scheduling and time estimating is crucial to maximizing their productive hours and minimizing overall outage duration.

Of course, fluff is a scheduler’s nightmare. When workers provide a cushion for themselves within their estimates, it will frequently throw the entire timeline off track. Realistically, finishing a job early is just as bad as finishing a job late. The goal should always be to have accurate estimates based on the averages.

Postgame Analysis

During an outage, there should be a constant flow of information—updating statuses and projecting outcomes—but when the outage is over, it’s also important to go back and review the results. It’s no secret that there can be a lot of stress and pressure when a unit is offline. Corporate folks want to know if the unit will be up on time, finance types are curious if the projections are right, people need parts “yesterday,” so emotions can run high.

Conducting a post-outage critique two to three weeks after startup with the plant back online and making money allows cooler heads to prevail. At that point, people can look at the lessons learned, provide constructive feedback, and come away with a better understanding of what can be done to improve the process in the future.

According to Starr, D&Z shares lessons learned and industry best practices with its clients. These include safety, quality, cost, and scheduling practices.

“At the end of the day, the project management tools and the principles are the same across all industries,” Starr said. “We try to drive the same planning concepts, as far as the estimating, the scheduling, having the core planning and execution team onsite early, having an integrated schedule, your lessons learned database and your historical estimating, all of that is the same across all of these industries.”

Good communication, a well-defined scope, solid personnel who buy into the plan, and the right checks and balances are all needed. In the end, what works is having a team effort. ■

Aaron Larson is a POWER associate editor (@AaronL_Power, @POWERmagazine).