Where Are We After 10 Years of Bulk Electric System Reliability Standards?

As concerns about grid security increase globally, it’s a good time to review the history, scope, and effect of North American electric system reliability standards. As the threat landscape changes, standards alone are not enough.

Mandatory. That’s the key word in the Energy Policy Act of 2005 (EPAct) where grid security is concerned. Within two years of that legislative action, the electric power industry was welcomed into the world of operation and planning regulation far beyond anything that had come before. As of June 18, 2017, we will have 10 years of experience with mandatory reliability standards. Are we better off as a result? Are the costs of compliance commensurate with the benefits? Will changes in approach to compliance currently under way enable a more efficient regulatory environment going forward? This article examines how we got to the current system and where we may go from here.

The Triggering Event

The April 14, 2003, blackout of much of the northeastern United States and parts of Canada was the impetus for EPAct and the accompanying changes to Section 215 of the Federal Power Act. The power to enforce mandatory reliability standards was granted to the Federal Energy Regulatory Commission (FERC).

This was not the first cascading event on the bulk power system resulting in a widespread blackout, and despite the best intentions of regulators and the law itself, it will not be last. Laws and regulations are about managing risks to acceptable levels and providing an incentive for compliance with requirements. Managing the risk of a major event to a near-zero value of probability would be prohibitively expensive. So, we are left with a system of regulations intended to minimize the risks while incorporating cost considerations into the equation.

The amendment to the Federal Power Act included a description of an Electric Reliability Organization (ERO) that would develop and enforce compliance with reliability standards. FERC is not granted the ability to develop standards; instead, it contracts that activity to the ERO, which is the North American Electric Reliability Corp. (NERC). (For a short history of NERC, visit nerc.com/AboutNERC/Documents/History_Dec12.pdf).

In early December, Federal Energy Regulatory Commission (FERC) staff released a 78-page electric “Reliability Primer” that provides an overview of FERC’s role in overseeing the reliable operation of the nation’s bulk power system. The primer outlines the basic structure of how the bulk power system operates with an explanation of fundamental concepts and functions related to power system operations. As FERC explains, reliability standards impose requirements on the users, owners, and operators of the bulk power system “to assure that they fulfill their responsibilities in reliable grid operations, consistent with the basic engineering functions and concepts discussed in the primer.”

Many industry participants wrongly believe that the NERC staff develops reliability standards. That particular activity is actually carried out by industry volunteers who participate—sometimes for years—on standard-drafting teams.

The period between introduction of the mandatory reliability standards and where we are today mirrors the five stages of grief:

■ We experienced denial that there was actually a need for enforceable standards, especially given the documented potential of $1 million per day per violation sanction. (As of the end of 2016, approximately $47,800,000 in sanctions had been assessed through the NERC compliance monitoring and enforcement program, with additional penalties in conjunction with FERC.)

■ We felt anger at the amount of extra time and effort required for the industry to produce and archive actual evidence of compliance.

■ Bargaining came into play as Registered Entities experienced their first audits and learned that being perfectly capable of verbally describing processes or activities was not the same as having documented evidence.

■ Depression was experienced most deeply by the health, safety, and environmental staffers or junior engineers who had the NERC standards thrown into their basket of duties by management teams that had a less-than-firm understanding of what they were asking for.

■ As company compliance programs began to form and be implemented, a semblance of order emerged. Acceptance came with the realization that mandatory compliance is here to stay; now the industry figures out how to comply in the most efficient way possible.

As we near the 10-year anniversary of mandatory standards, we have to ask whether we are better off now, from a BES reliability perspective, than when we started 10 years ago. There are dozens of measures available to help us in our assessment. Some examples are the Reliability Indicators from NERC that track the components of the Adequate Level of Reliability definition. These include Reserve Margin, BPS Transmission Related Events Resulting in Loss of Load, Interconnection Frequency Response, and Energy Emergency Alerts, among others.

One of the most compelling data sets is included in the NERC “State of Reliability 2016 Report.” The metric for bulk power system events resulting in a loss of load (excluding weather-related events), M-2, “is considered to be improving,” according to NERC (Figure 1). Certainly, when you look at the load affected over time, the improvement since the 2003 blackout is notable (Figure 2).

1.Bulk power system transmission-related events resulting in load loss. Source: NERC — 1. Bulk power system transmission-related events resulting in load loss. Source: NERC

2.Total annual load loss. Each band of color represents a different event. The vertical axis scale has been truncated due to the large value (see number) of the 2003 Northeast blackout event. Source: NERC — 2. Total annual load loss. Each band of color represents a different event. The vertical axis scale has been truncated due to the large value (see number) of the 2003 Northeast blackout event. Source: NERC

Cyber and Physical Attacks

Although no North American service outages have been attributed to cyberattacks to date, the inherent risk of such an occurrence continues to be at the top of the list of risk elements identified in the NERC Compliance Monitoring and Enforcement Implementation Plans.

At the same time, the set of Critical Infrastructure Protection (CIP) standards are among those most violated by Registered Entities. Of the top 10 most violated standards through the first three quarters of 2016, half of them are CIP standards (Figure 3).

3.Most violated standards. Critical infrastructure protection (CIP) standards were, in aggregate, the most violated North American Electric Reliability Corp. (NERC) standards in the first three quarters of 2016. Source: NERC Compliance Monitoring and Enforcement Program Report Q3 2016 — 3. Most violated standards. Critical infrastructure protection (CIP) standards were, in aggregate, the most violated North American Electric Reliability Corp. (NERC) standards in the first three quarters of 2016. Source: NERC Compliance Monitoring and Enforcement Program Report Q3 2016

The reason for the relatively high incidence of violations can be attributed to two factors: complexity and the newness of CIP measures to a large portion of the industry. (See sidebar “An Important Cybersecurity Resource.”)

An Important Cybersecurity Resource

The Electricity Information Sharing and Analysis Center (E-ISAC), operated by the North American Electric Reliability Corp., establishes situational awareness, incident management, coordination, and communication capabilities within the electricity sector through timely, reliable, and secure information exchange. The E-ISAC, in collaboration with the Department of Energy and the Electricity Subsector Coordinating Council, serves as the primary security communications channel for the electricity sector and enhances the sector’s ability to prepare for and respond to cyber and physical threats, vulnerabilities, and incidents.

Among E-ISAC’s recent work is an eight-page “Internet of Things DDoS White Paper,” published for a public audience in response to the fall 2016 distributed denial of service (DDoS) attack that used consumer Internet of Things (IoT) devices. As E-ISAC notes, “existing attack surfaces and new malware payloads were exploited in unique ways, using custom attack software. The E-ISAC developed the following recommendations for defensive capabilities in the Electricity Subsector with suggestions to improve the overall posture of network security and cybersecurity within our community.”

Although the October 21, 2016, DDoS attack on the Dyn Managed Domain Name System used consumer IoT devices, the white paper notes that “Devices using high bandwidth connections, such as security cameras in plants, facilities, substations, and switchyards have the potential to create a substantive impact on the Electricity Subsector.”

“There are several factors highlighting the wide attack surface that similar devices provide, including:

■ usually open access to the Internet;

■ the use of default login credentials and weak passwords that are implemented across entire product lines;

■ implementation of common operating systems without the benefit of deactivated daemons or services, and removed executable files that could be remotely or programmatically activated.”

To learn more, download the white paper from https://www.esisac.com.

—Gail Reitenbach, PhD, Editor

Although the latest versions of the CIP standards have clarified expectations, a learning curve remains in effect, particularly for small and medium-sized entities. While large vertically integrated utilities enjoy the benefits of skilled and dedicated CIP experts on their staffs, smaller participants often struggle to assimilate the constantly changing requirements and various implementation timelines into their programs. In addition, the smaller programs are often designed, executed, and updated by a single individual responsible for regulatory compliance as a small part of their overall duties.

No one argues that the CIP standards are not extremely important, but the history of the standards and the constantly changing requirements represent a unique challenge even for those who strive for 100% compliance 100% of the time. For small to medium-sized Registered Entities, a combination of networking with other entities, selecting which notices to receive from NERC relative to your compliance risk, and outsourcing certain services—like updating procedures and monitoring changes in standards and requirements—have proven effective.

Threats to the physical security of certain elements of the interconnected systems are also a significant concern. Current standards require utilities to identify the most operationally significant areas on transmission systems and have a plan to protect those elements and locations. While a successful cyberattack might take months or years to plan and execute, a physical attack can be carried out quickly, with very little preparation time. The physical security standards continue to be implemented, and we can look forward to ongoing enhancements as lessons are learned and best practices identified in the current work.

An early admonition from industry stakeholders on the content of reliability standards was, “Tell us what, don’t tell us how.” This ideal is largely preserved to this day. For both the CIP and the Operations and Planning standards, there is ample opportunity to fulfill the reliability requirements in different ways. The ability to account for and describe specific or unique approaches to compliance is built in to auditor tools like the Reliability Standard Audit Worksheets. The questions pertaining to how one complies with the standard or requirement in question is a legacy of the “what, not how” industry advice.

In responding to a question about the diversity of compliance measures, Rocky Sease, CEO of SOS International, puts it this way, “I don’t think anyone realized how many different solutions would be applicable to any one standard. Compliance solutions vary from utility to utility based on governance, culture, technology, history, and many other factors. There is no one-size-fits-all solution, and each utility strives for a resolution compatible with its unique challenges.”

The best approach is to achieve a full and complete understanding of the scope of standards and requirements applicable to your specific operation. With that achieved, a unique Internal Compliance Program can be developed to guide the day-to-day activities around compliance. This program should address the “who, what, when, and where” of compliance activities and the production of documented evidence. Another essential feature is to identify the subject matter experts, compliance performers, standards owners, and a repository for evidence.

Shift to a Risk-Based Approach

A fairly recent development has brought a degree of sanity to the mechanics of compliance. In 2014, NERC launched the Reliability Assurance Initiative to develop and implement a risk-based approach to compliance monitoring. This program entails an Inherent Risk Assessment for each Registered Entity to bring clarity to the relative risk each represents to the reliability of the BES based on size, operation, and other factors. Although this assessment does nothing to affect the population of standards and requirements applicable to each Registered Entity, it does lay out those standards and requirements that will be monitored for compliance.

This approach enables more efficiency in monitoring efforts and rightly assigns the greatest degree of monitoring to those entities with the highest relative risk to reliability. This concept generally results in smaller entities, like renewable generation (see sidebar, “Yes, Renewables Are Included”) and distribution providers, being monitored on a reduced set of standards, while larger utilities retain the greater share of monitoring attention. These measures will help alleviate the backlog of compliance monitoring projects at the various Regional Entities and allow the greatest monitoring efforts to be directed toward organizations that have the greatest potential to affect overall reliability. (For more on risk-based management, see “Risk-Based NERC Compliance: Assessing Risk to Bulk Power System Generation” in the June 2016 issue.)

Yes, Renewables Are Included

Renewable generation continues to represent new operating and compliance considerations. While many solar and wind facilities are inherently very dependable within the envelope of their intermittent nature, it is becoming increasingly apparent that the one-size-fits-all attributes of the reliability standards do not always deal realistically with renewable generation.

From a standard perspective for example, a generator is a generator, whether it is a 1,500-MW baseload facility or an 80-MW wind farm in a cornfield. Measures have been taken in the area of protection system maintenance and testing, even in the BES definition, to bring clarity and a recognition of relative risk to the discrete components that make up renewable generation facilities.

The industry as a whole has done an incredible job of incorporating intermittent renewables into dependable system operation and power market scenarios. As for reliability, both revisions to existing standards and new ones should recognize the sometimes unique operating attributes of renewables.

Internal Controls

Another fairly recent development has been the formal introduction of “internal control” into the mix of compliance considerations for Registered Entities. Internal controls should relate to the inherent risk posed by a particular Registered Entity and any associated NERC Reliability Standards. The assessment of one’s internal controls plays a significant role in the Compliance Oversight Plan used by the Regional Entity to monitor for compliance. Internal controls are generally categorized by types that define where in the compliance process they can be employed. These categories are Preventive, Detective, and Corrective.

Internal controls are not mandatory, but their usefulness in managing any process with inherent risks of underperformance is well established. They may be relatively new to the power industry, but they are common practice in other businesses like financial and healthcare companies. While not mandatory, internal controls can be highly useful in managing the risk of noncompliance in a regulatory environment. A Registered Entity can elect to have an Internal Control Evaluation (ICE) performed by its Regional Entity. The ICE allows the Registered Entity to provide information to its respective Regional Entity about internal controls that address the risks applicable to the entity and for identifying, assessing, and correcting noncompliance with reliability standards and demonstrating the effectiveness of such controls.

Internal controls should have a stated purpose or goal, such as “Maintain effective error-free communications” or “Ensure testing activities are accomplished on time.” Examples of internal controls includes training (to help enhance human performance), peer review of procedures (to ensure broad input and achievable metrics), and review of communication logs (to verify proper and timely communications and highlight areas for improvement). From a compliance perspective, strong internal control conveys a high level of confidence that measures are in place to promptly prevent, detect, and correct any gaps in compliance.

What’s Next?

Our 10 years invested in the creation, revision, and assimilation of the mandatory reliability standards allows us to make some logical assumptions for likely developments over the coming years.

First, we can look forward to ever-increasing complexity in the standards, especially in the areas of cybersecurity and system modeling. Constantly changing threat landscapes in cyber threats are not unique to the electric power industry and are issues of global significance. More accurate data, both in system monitoring and planning, and greater participation in programs like the Generator Availability Data System will enable planners to both identify trends in BES element reliability and implement more precise modeling of system operating limits.

Second, training will continue to escalate as a critical issue for Registered Entities. Factors that contribute to this escalation include retirement of highly capable system operators and the need to train their replacements, more training requirements for generator owners and operators required by the standards, and CIP awareness and procedure measures for Low Impact Cyber Systems. Under the bright line criteria of the CIP-002-5.1 BES Cyber System Categorization standard, any BES element that employs Cyber Systems is at least a Low Impact Cyber System. This would include all wind and solar generators who meet the BES registration criteria and who utilize some form of cyber control or monitoring of their facility.

Third, as compliance programs continue to evolve and mature, and as Registered Entities implement comprehensive compliance programs, the risk of the requirements becoming a matter of routine presents other challenges. David Hilt, a former vice president of compliance at NERC, and a key contributor to the 2003 blackout investigation and report, was asked what he sees as the greatest threat to reliability in the coming years. He answered: “Complacency. As NERC and the regulators throughout North America have implemented a regime of mandatory reliability standards that have focused on improving reliability of this most critical infrastructure, we cannot become complacent about reliability and believe it has been taken care of. We will continue to face emerging issues and new challenges to the reliable operation of the Bulk Electric System.”

Fourth, the cost of compliance remains a high priority both among regulators and Registered Entities. In 2016, NERC initiated a pilot program to consider the balance of cost and benefits for the implementation of reliability standards. Reliability Standard TPL-001-4 Transmission System Planning Performance Requirements was selected as the pilot standard. The pilot sought to define the relative risks of not complying with the requirements within the standard, followed by industry input on costs. There was a wide variation in the responses that ranged from “one additional FTE” to “considerable costs.”

On an industry-wide basis, a fair average is $40,000 to $50,000 for small Registered Entities to manage their compliance risks through additional staff duties or staff augmentation from outside advisors. For larger, transmission-oriented entities, a core compliance staff of four to six experts and supporting personnel, with input and associated time contributed by dozens of subject matter experts and process owners, can easily run into the hundreds of thousands of dollars.

Finally, as the electric power industry continues to improve its approach to system reliability, we come back to where we started—with the people who operate, monitor, and maintain the interconnected grids (Figure 4). Human performance was identified as one of the key risk factors to reliability in the NERC Compliance Monitoring and Enforcement Implementation Plan, because the performance of personnel who make the decisions, both in real time and in the planning of the system, is perhaps the single greatest risk factor.

4.People remain critical. Regardless of the number and nature of reliability standards, well-trained and diligent staff are the most important element of a reliability program. Courtesy: Getty Images — 4. People remain critical. Regardless of the number and nature of reliability standards, well-trained and diligent staff are the most important element of a reliability program. Courtesy: Getty Images

Until we have a system monitored and operated by robots or software (which could present its own unintended consequences), the people who manage the production and delivery of electric power are absolutely vital to everything the regulators and the reliability standards are trying to achieve. Human performance is, of course, a vital issue in many industries, but the BES can pose unique challenges.

When asked if there is a difference in how electric system operation approaches human performance when compared to other industries, Pam Ey, PhD, a widely acknowledged expert on human performance, describes it this way: “Absolutely. Our electric grid is an amazing complex system, plugged into business architectures that make optimization of Human Performance initiatives particularly challenging. We can see impacts in our industry in the way that work instructions are written, information is shared, and metrics are used, for instance in training programs. But I am energized by discussions with leaders in utilities concerned with creating proactive HP practices as a sturdy framework for resilient operations. Thankfully, both the medical and airline industry have paved the way.”

Standards and processes, procedures and audits are all important, but the talented and dedicated people who keep our lights on and our security intact will always be the critical element. ■

—James Stanton (james.stanton@sosintl.com) is director of advisory services at SOS International.

Tagged in: