Designing a resilient contact center in AWS with Amazon Connect

In today's fast-paced business world, customer support is a critical aspect of maintaining long-term success. As customer expectations continue to rise, it's becoming increasingly important for businesses to have reliable and efficient contact centers to support their customers. Based on PwC’s 2023 US Cloud Business Survey, 93% of cloud powered companies experience improved resilience compared to organizations not prioritizing cloud based technology and solutions. One of the most effective tools for achieving this goal is Amazon Connect, a cloud-based contact center service that offers flexibility, scalability and cost-effectiveness.

What is Amazon Connect?

As a cloud-based contact center service, Amazon Connect makes it easier for businesses to set up and manage a customer contact center. It's designed to be scalable and flexible, allowing businesses to more easily adapt to changing customer needs and seasonal fluctuations in demand. It offers advanced features like automatic call distribution (ACD), interactive voice response (IVR) and speech recognition — making it quicker for customers to connect with the right agent and get the help they need quickly.

Built on Amazon Web Services (AWS), providing a high level of security, scalability and reliability. The service is fully managed, which means businesses don't need to worry about managing their own infrastructure or performing maintenance tasks. Amazon Connect offers pay-as-you-go pricing, which makes it cost-effective for businesses of various sizes.

What makes up the Amazon Connect ecosystem?

Supported by several key components, the Amazon Connect ecosystem works to provide a seamless customer support experience. These components include:

  • Amazon Connect instance: A virtual environment that includes all the configuration settings, routing configurations and contact flows for a contact center. It's where customer interactions are managed and the central hub for all the other Amazon Connect components.
  • Amazon Connect configurations: Amazon Connect instance supports multiple configurations like contact flows, queues, routing profiles, hours of operations, user management, chat, reporting, voice ID, etc. Contact Flows are the paths that customer interactions follow when they call or message a contact center. You can build the routing logic, prompts and other elements to determine how a customer is directed to an agent or self-service option. There is also an option to choose which queues are routed to, along with hours of operations for your contact flows.
  • AWS integrations: The ability to integrate with multiple AWS services to offer a personalized customer experience for your customers. A few example integrations would be using Amazon Lex within contact flows to help understand customer intent or integrating Lambda functions with an organization’s backend services to gather data about the customer like loyalty information etc.

A multi-phase plan to implement resilience allows organizations to establish the bare minimum abilities to reach and service customers of an outage on day one. This plan can help a contact center quickly recover from any outage, reduce the impact on customers and the business, and establish business continuity.

A key to the resilience plan is confirming that all components of the contact center ecosystem, such as the Amazon Connect instance, phone numbers, routing profiles, queues, agents and integrated AWS services are included from the get-go. While having a day one resilience design may be sufficient to maintain business as usual, a more robust resilient design should be pursued and mapped to business critical features for advanced functionality and capabilities.

Benefits of implementing resiliency in contact center

While Amazon Connect is a reliable and robust service, no technology is immune to unexpected disruptions. That's why implementing resiliency is essential for maintaining business continuity and confirming that customer support is always available.

Four key benefits of implementing a resiliency are:

  1. Reduced downtime: Planning for a secondary contact center that has backup systems and redundant connections, agents can continue to answer calls and provide support to customers, even if the primary site experiences an outage.
  2. Maintain customer trust: When a contact center is prepared to handle unexpected events, customers are more likely to trust the company and its ability to provide consistent and reliable service. This can result in increased customer loyalty and long-term business success.
  3. Safeguard data: By safeguarding sensitive customer data via secure backups and redundancy, you can help prevent potential data loss or breaches — which can be costly and damaging to a business.
  4. Meet compliance requirements: Many industries have strict regulatory requirements for resilience, including financial services and healthcare. Having a resilience plan in place can help to confirm compliance and avoid penalties.

Six steps to implementing resiliency in Amazon Connect

Clients often take a phased approach to implementing resilience in Amazon Connect, separating the design from the implementation of a resilient environment. A phased approach allows the organization to establish business objectives and a resilience design while incorporating multiple teams across the organization to gain alignment and necessary approvals. Once a business is aligned on a go-forward resilience design, implementation is then able to begin. Organizations often follow these steps to designing and implementing resiliency in Amazon Connect.

  1. Define the recovery objective. The first step is to define your recovery objectives, including your recovery time objectives (RTO) and recovery point objectives (RPO). These objectives can help you determine how quickly you need to recover your systems and data in the event of an outage and how much data loss is acceptable.
  2. Identify the Amazon Connect components. The next step is to identify all the Amazon Connect components that should be included, such as: Amazon Connect instance, phone numbers, routing profiles, queues and agents.
  3. Choose your resiliency strategy. You should choose the right strategy that meets your recovery objectives and fits within your budget. Amazon Connect supports several resilience strategies, such as utilizing the newly announced Amazon Connect Global Resiliency (ACGR) feature that allows for telephony failover and traffic distributions across AWS regions for establishing failover of core ecosystem components.
  4. Configure your Amazon Connect components. You should configure your Amazon Connect components to support your resilience strategy. It’s a leading practice to use infrastructure as code to failover your components to a secondary region so RPO/RTO requirements can be achieved.
  5. Test your plan. Once you have configured your Amazon Connect components, you should test your resilience plan to confirm that it works as expected. This involves simulating failure scenarios, such as shutting down your primary instance, and assessing that the backup instance is working correctly.
  6. Monitor the resilient environment: Monitoring the environment your contact center is in can help to establish that it's functioning appropriately and can be activated quickly in the event of an outage. AWS monitoring tools like Amazon CloudWatch can help detect any issues or anomalies.

Using Amazon Connect Global Resiliency

With the implementation of a resilience strategy for Amazon Connect using Amazon Connect Global Resiliency (ACGR), we can now help address the impact of recent AWS service outages, even as recent as June of 2023.

To illustrate this, let’s consider a scenario where a customer calls in and encounters the absence of a dial tone, indicating that the Amazon Connect phone number is down. In such an event, establishing multi-region resilience becomes crucial for resolving the outage. We define the steps to enable and utilize ACGR in the following stages:

  • Pre-recovery: These steps can be deployed before a disaster. This is critical in confirming the recovery steps are limited to what needs to be done in an outage to optimize the RTO
  • Recovery: Steps taken to failover to the DR region during a disaster
  • Post-recovery: Steps taken to failback to the primary region after a disaster is resolved

The representative steps can be seen below in the graphic and the table.

Recovery steps

Pre-Recovery steps performed to set up ACGR in a secondary region

1. Create Replica Instance 

aws connect replicate-instance --instance-id <INSTANCEID> --replica-region <REGION> --replica-alias <ALIAS>

2. Create Traffic Distribution Groups (TDG) to map existing toll free numbers and DIDs to both Connect instances

aws connect create-traffic-distribution-group --name <NAME> --instance-id <SOURCEINSTANCEID>

3. Update existing CI/CD pipelines to deploy Amazon Connect configurations and AWS resources to the secondary region.

Some of the Amazon Connect configurations to account for are queues, routing profiles, security profiles, hours of operations, call flows etc.

The AWS resources to account for are integrations with Lambda functions, S3 buckets, Kinesis streams, Lex bots etc.

4. Update phone numbers to map your existing claim number from its current instance to another instance

aws connect update-phone-number --phone-number-id <PNID> --target-arn <TDG ARN>

Recovery steps performed once an outage has been detected

5. Perform 100% traffic distribution to secondary region

aws connect update-traffic-distribution --id <TDG ID> --cli-input-json '{ "TelephonyConfig": { "Distributions": [ { "Percentage": 100, "Region": "<REPLICA REGION>" } ,{"Percentage": 0, "Region": "<SOURCE REGION>" }] } }' 

6. Perform validation tests

Dial a phone number that you placed into your TDG . Complete any necessary actions to route the call to the queue you are monitoring. Confirm the call is routing to the replica instance.

Post-Recovery steps performed to failback to primary region

7. Perform 100% traffic distribution to secondary region

aws connect update-traffic-distribution --id <TDG ID> --cli-input-json '{ "TelephonyConfig": { "Distributions": [ { "Percentage": 100, "Region": "<REPLICA REGION>" } ,{"Percentage": 0, "Region": "<SOURCE REGION>" }] } }'

Contact us

Ian Willoughby

Principal, Cloud & Digital Transformation, AWS Practice, PwC US

Ross Chernick

Director, Cloud & Digital Transformation, AWS Ambassador, PwC US

Nausheen Jawed

Senior Manager, Cloud & Digital Transformation, PwC US

Connor McCrory

Manager, Cloud & Digital Transformation, AWS Ambassador, PwC US

Follow us