Banking on resilience: PwC

The FFIEC’s recent release of its Business Continuity Management handbook sets critical new paradigms for FS examiners, signaling a shift to operational resilience.

Guidance from the Federal Financial Institutions Examination Council (FFIEC) makes it clear that, in the financial services industry, recovering IT systems quickly after an outage is no longer good enough.

Bank regulators are expanding the old business continuity planning and disaster recovery (BCP/DR) model to encompass all aspects of resilience (ie. operational and cyber), effectively setting a new bar for regulated entities.

Rethinking resilience

As Financial services (FS) regulators around the world shift their focus, PwC has done the same. We’ve been calling for a rethinking of resilience for a number of reasons:

With globalization and increased competitive pressures leading to more outsourcing, offshoring and automation, FS firms are now more interconnected and complex than ever before. A breakdown at any one step can disrupt the entire chain.
Financial institutions are innovating in new areas—migrating more and more services and data to the cloud, for example—but managers’ understanding of these technologies doesn’t keep pace with the speed of change. Too often they don’t update their risk and resilience programs to account for critical dependencies that emerge.
Since the last financial crisis, enhanced risk management, stress testing, capital planning and liquidity management have generally improved financial resilience. But traditional BCP/DR activities have received less attention in some firms, and often are focused on maintaining existing capabilities, rather than continuously improving in maturity and depth.
Regulators increasingly expect boards of directors to don the mantle of operational resilience oversight, a task for which they may not be adequately prepared.

The FFIEC addresses these concerns and sets parameters for regulatory examiners of financial institutions and their third-party service providers.

Issued in November 2019, the FFIEC’s Business Continuity Management booklet represents the council’s first significant update in more than four years. It expands its focus to business continuity management, not just business continuity planning. In doing so, it echoes some of the key tenets of the 2018 Bank of England’s (BoE) influential discussion paper, Building the UK financial sector’s operational resilience (PDF, 868 KB).

The update formalizes a definition of resilience found in the National Institute of Standards and Technology (NIST) glossary: “The ability to prepare for and adapt to changing conditions and withstand and recover rapidly from disruptions. Resilience includes the ability to withstand and recover from deliberate attacks, accidents, or naturally occurring threats or incidents.”

It also enjoins examiners to hone in on FS enterprises’ and service providers’ ability to keep their most important business functions operating and available to customers and other stakeholders. And it wants to see FS entities working to minimize any ripple effects an outage might have on others in its business ecosystem and on overall financial systems.

Subtle but significant shifts to resilience that the FFIEC will trigger

While the BoE’s paper introduced bold new concepts, the 2019 FFIEC update appears to aim for a more nuanced pivot from BCP/DR to operational resilience.

Here are the shifts in a nutshell:

1. Moves emphasis away from business continuity planning (BCP) to business continuity management (BCM)
2. Provides a repeatable process for identifying critical business functions
3. Introduces the term “maximum tolerable downtime”
4. Emphasizes need for more meaningful testing
5. Allows more flexibility in testing
6. Refers to entities, not just “financial institutions”
7. Expands the role of Business Impact Analysis (BIA)
8. Spells out resilience duties of management and boards

1. Moves emphasis away from business continuity planning (BCP) to business continuity management (BCM)

The 2015 FFIEC document spoke of systems recovery, whereas the new booklet emphasizes the continuity of operations throughout the overall entity: technology, operations, testing and communication, focusing on the "continued maintenance of systems and controls for the resilience of operations."

2. Provides a repeatable process for identifying critical business functions

The new document provides a clear, repeatable process for identifying critical business functions and analyzing their interdependencies internally and externally (also known as “mapping”). It also says that entities should understand how a disruption of these functions could affect markets and the entity’s larger community.

3. Introduces the term “maximum tolerable downtime”

The FFIEC booklet directs entities to determine how much disruption they can tolerate—including data loss as well as downtime. It also clarifies how entities should establish their targets for post-cyber-event systems recovery and data restoration, advising organizations to be realistic: “Establishing realistic RTOs (recovery time objectives) assists management in determining a critical path and hierarchy for recovery. For example, a process with a shorter RTO that is dependent upon on a process with a longer RTO may indicate a gap that should be analyzed further,” the document states. The concept appears similar to the BoE discussion paper’s “impact tolerances.”

4. Emphasizes need for more meaningful testing

Conducting tabletop exercises is no longer enough: the FFIEC guidance instructs examiners to also look for integrated tests of technology and business functions using multiple, complex and threat-intelligence-driven scenarios with event simulations.

5. Allows more flexibility in testing

While yearly testing of BCP/DR plans has long been the norm, the 2019 FFIEC booklet affords a multi-year testing schedule where appropriate—a change enabled in part by more robust testing. While high-priority business functions might still need annual testing, those deemed less critical could be tested every two or three years, for example. This change recognizes the burden that undifferentiated yearly testing can place on financial institutions, and lets them use periodic tests to build maturity over time.

6. Refers to entities, not just “financial institutions”

Again, this change is subtle, but the language of the FFIEC document now encompasses non-financial organizations such as cloud service providers, establishing that, if they provide services to financial institutions, they must follow the same rules.

7. Expands the role of Business Impact Analysis (BIA)

The new booklet expands the role of BIA from merely identifying risk to also maintaining business continuity with continuous systems monitoring, which can help to ensure that changes in business operations are always accounted for. It also calls for continually improving resilience processes by using metrics to analyze the effects of every disruption and to determine whether recovery objectives are reasonable.

8. Spells out resilience duties of management and boards

The new guidance is clear on the duties and functions of management and the board of directors. “The board and senior management should set the ‘tone at the top’ and consider the entity’s entire operations, including functions performed by affiliates and third-party service providers, when managing business continuity,” the document advises.

The case for proactive action to build resilience

Resilience is taking precedence among FS regulators not only in the US but worldwide. One reason is the escalation of cyberattacks on the FS industry, including nation-state sponsored incidents. Financial institutions globally experienced six nation-state attacks alone in 2018, up from two each in 2016 and 2017.

On the heels of its influential 2018 discussion paper, the BoE’s decision to stress test UK banks’ operational resilience this fall prefigured the FFIEC changes. (The BoE published the results of those tests in December 2019.)

But regulators already have been issuing resilience-focused Matters Requiring Attention (MRA) letters directly to financial institutions—even before the FFIEC published its update.

The writing is on the proverbial wall, and every financial entity and service provider would do well to pay attention. Those who embark now on the road to resilience will enjoy many advantages over those forced to contend with an MRA.

Remediating an MRA triggers a costly and stressful process of developing plans and implementing them on a tight schedule. Those so penalized must also satisfy regulators that they can maintain their resilience posture over the longer term, beyond remediation.

In the meantime, savvier organizations worldwide (those who scored high on resilience measures, so-called “high-RQ”) have already been revamping their BCP/DR programs with resilience in mind, according to PwC’s Digital Trust Insights study.

Being proactive on resilience means being able to manage the scope, costs and timing involved in building an organization's operational resilience.

Establish a team to oversee resilience enterprise-wide, ideally under the leadership of a Chief Resilience Officer.
Step up your first-line (management) and business teams’ involvement in responding to threats and disruptions.
Revamp your remediation programs to include all affected functions: business units, operations, technology, RRP and your resiliency organization.
Take advantage of existing industry initiatives such as Sheltered Harbor, which the FFIEC booklet mentions as “An example of an industry initiative to assist in addressing the resilience of customer account information.”

Expand the scope of your Business Impact Analysis to include identifying all your business functions, prioritizing them in order of their criticality, setting realistic RTOs, MTDs and data restoration targets, and emphasizing the restoration of operational processes and critical business functions within those targets.
Map your dependencies between functions, processes, technology assets, and other internal and external participants.
Use a common taxonomy enterprise-wide listing recovery plan inputs.

Assess and test impacts of cyber incidents and disruptions using simulations and other more rigorous tests in addition to tabletop exercises. After an incident, ask: Were business functions interrupted? How quickly and effectively were they restored? Did you meet your targets? Why or why not?
Build a dedicated test environment that can handle robust and complex simulations.
Identify and monitor continuity risks, and scrutinize your metrics regarding incidents and disruptions using a variety of dashboards to analyze them from different perspectives. Include a “mandatory adherence to standards” test. Do you pass? Why or why not?
Strengthen your third-party risk management so that you provide the same level of scrutiny to non-FS organizations and service providers as to those in FS.

Update your scenario libraries to account for new risks such as cyber attack-related data loss.
Adopt more complex operating models to safeguard third-party services (such as cloud services), remote workforces and increases in mobile end-users.
Automate your recovery. Manual processes take more time, making it more likely that large, complex entities will miss restoration goals.