Modernize Java on AWS: EC2 to EKS migration guide: PwC

Nausheen Jawed

Director, Cloud & Digital Transformation, PwC US

Email

Ross Chernick

Director, Cloud & Digital Transformation, AWS Ambassador, PwC US

Email

From monolithic Java to cloud-native on AWS

PwC helped re-architect a legacy Java application running on Amazon EC2 into a scalable, container-based solution using Amazon EKS. By decoupling workloads, externalizing storage, modernizing CI/CD pipelines, and adopting cloud-native security patterns, the organization improved system resilience, accelerated releases, reduced operational risk, and enabled teams to focus on innovation rather than maintenance.

“By moving to a cloud-native architecture on Amazon EKS, the team dramatically improved resilience, security, and release velocity—without disrupting business operations.”

Hours

Deployment cycles reduced from weeks to hours

PwC client case study Java workload modernization on AWS

Zero

Required downtime for updates after modernization

PwC client case study

Modernizing workloads hosted on Amazon EC2 using EKS

Modernizing critical applications on aging infrastructure presents formidable challenges. Organizations can’t afford downtime, and even small changes introduce risks. As a result, many enterprises try to maintain the status quo for as long as possible. They update and patch systems here and there—leaving themselves exposed to security vulnerabilities, slowing development, and adding to operational overhead. Often, teams end up spending more time reacting to problems than innovating and adding value.

This case study describes how PwC modernized a legacy Java 1.7 application for a global media and information services client. Starting with a single virtual machine (VM) that stored all files (including hardcoded secrets) in a single MySQL database, we built a resilient, cloud-native platform on Amazon EKS. The project delivered improvements in deployment speed and frequency, system reliability, security, and team productivity. With these changes, developers shifted from reactive maintenance to a framework of innovation and proactive feature delivery.

The company achieved this transformation through a structured four-step PwC modernization framework: discover, design, modernize, and operate. This approach incrementally rebuilds the platform from the ground up—introducing cloud-native patterns and services with minimal disruption and downtime. The project simplified and streamlined complex tasks, including transferring files to Amazon S3, moving the database to Amazon Aurora (MySQL‑compatible), placing centralized secrets in AWS Secrets Manager, integrating authentication via Okta SSO (OIDC) and upgrading the runtime to Java 17.

Along the way, the company had to struggle with an array of legacy problems, manage complex decisions, weigh various trade-offs, and execute the move to Amazon EKS.

Key Business Benefits

For the client, this modernization project delivered critical business benefits:

Resilience and Availability: A containerized Amazon EKS architecture has boosted resiliency and improved uptime. Now, if a container or node fails, Kubernetes automatically spins up a new instance. This means that apps continue running without any interruption and users won’t notice a disruption. Rolling updates have also simplified maintenance; there’s no reason to take the system offline for patching or upgrades, or schedule downtime after hours.
Scalability and Performance: The platform now scales to meet demand dynamically. By containerizing different parts of the workload and setting up Kubernetes autoscaling on Amazon EKS, the company improved resource management. Processor-intensive tasks now run in their own containers, so they’re not competing with the user-facing parts of the app for resources. The result is an infrastructure that’s equipped to handle traffic spikes; even at peak times, apps remain fast and responsive.
Security and Compliance: In the past, hard-coded secrets represented a significant security risk. After moving them to AWS Secrets Manager, the organization centralized user authentication in Okta SSO. Among the benefits: multi-factor authentication (MFA) and role-based access control (RBAC). In addition, these changes enhance security and simplify audits thus making sure that the company meets strict compliance requirements.
Faster Releases and Agility: After moving to a DevOps framework, the company accelerated its release cycle. Automated CI/CD pipelines using GitHub Actions for builds and Argo CD for deployments, also boosted speed and reliability. Today, changes take place in a few hours, with immediate feedback on each change, and it’s easy to roll back to a previous state if something goes astray. Previously, a new update required weeks. What’s more, the company can respond to essential changes and take advantage of business opportunities without delay.
Cost Optimization: A modernized platform running on AWS has cut costs and delivered other benefits. For example, the company can now use auto-scaling to improve IT and labor resources. There’s no longer a need to pay for idle capacity. Amazon S3 lifecycle policies automatically transfer older files to less expensive storage tiers. In the end, the enterprise trimmed AWS costs while gaining significant performance and security benefits.

Legacy Architecture and Challenges

To understand the impact of the modernization, it’s important to first examine the legacy environment:

All application components, including the web layer, API tier, admin interface, and file-processing tasks, previously ran on a single large virtual machine.
A separate VM hosted the MySQL database. This setup required manual oversight for tasks such as patching and backups.
Local disks held all types of files: input, output, archived, and temporary.
Sensitive information such as database credentials and API keys were hard-coded directly into the .pom file and build configuration files.
Scaling had to take place vertically. This means that whenever traffic increased or large file batches required processing, the system slowed and the UI became less responsive.

Here were some of the key challenges:

Because the setup relied on local disks as the primary place to store data, it was prone to data loss. Backups frequently failed, and data recovery could take hours.
Any scheduled maintenance caused the entire system to go offline. Unexpected outages often translated into manual fixes to bring services back up.
Horizontal scaling wasn’t feasible; file-processing jobs often conflicted with user traffic on the same server. This dragged down performance.
Storing secrets in the source code and giving admin access on an ad-hoc basis created significant security and compliance risks.

Legacy Architecture

Target Architecture

The new architecture was designed with a focus on security, scalability, and resilience.
It runs on Java 17 LTS, which brings better performance, stronger security defaults, and improved TLS support.
Application components run as stateless containers on Amazon EKS, using readiness and liveness probes for health checks and horizontal pod autoscaling to handle variable workloads.
Application components were split into separate stateless containers, the web application runs on its own pod, and jobs execute independently in dedicated pods. This separation improved fault isolation and resource control.
Files are stored in Amazon S3, which serves as the durable system of record, while local storage is used only for temporary in-flight processing.
The database layer uses Amazon Aurora MySQL to provide managed high availability, quick failover, and efficient read scaling.
Secrets are managed through AWS Secrets Manager, with IAM Roles for Service Accounts (IRSA) controlling access securely.
User authentication and authorization are handled through Okta SSO (OIDC), providing centralized identity management, MFA, and role-based access tied to user groups.

Architecture Diagram

Modernization Framework

To guide the modernization of this legacy Java application, we applied a four-step framework tailored to the client's challenges and technology landscape. Each step built upon the last, confirming both strategic alignment and technical rigor throughout the project:

Discover:

Conducted a holistic assessment of the legacy environment (Java 1.7 monolith, single VM, local disk storage).
Audited file paths, secrets embedded in config, and runtime behaviors.
Established performance baselines and key operational challenges.
Identified compliance gaps related to secret handling and system reliability.

Design:

Created the target cloud-native architecture blueprint using Amazon services (EKS, Aurora, S3, AWS Secrets Manager).
Defined workload separation between web tier, processing jobs, and system-level services.
Designed rollback-friendly upgrade paths for Java runtime, CI/CD, and storage layers.
Prioritized security and fault isolation in the new design.

Modernize:

Upgraded codebase and runtime to Java 17.
Containerized application components with non-root images and multi-stage builds.
Built a GitHub Actions-based CI pipeline and deployed to Amazon EKS via Argo CD .
Migrated local disk storage to Amazon S3 and the MySQL database to Amazon Aurora.
Centralized secrets into AWS Secrets Manager using IAM Roles for Service Accounts.
Integrated Okta SSO for federated identity and role-based access.

Operate:

Cut over production traffic gradually using DNS and ALB configuration.
Established observability with Amazon CloudWatch and New Relic for metrics, logs, and traces.
Verified rollback and disaster recovery plans including Amazon S3 and Aurora snapshot restores.
Tuned autoscaling, lifecycle policies, and cost controls for sustained operations.
Trained client teams on Kubernetes operations and cloud-native incident response.

Deep Dives: Design Rationale, Trade‑offs, and Implementation Notes

Refactoring to Stateless

One of the biggest shifts was moving the application toward a stateless design. Each code path that touched the local filesystem was audited and reworked. Temporary files now use Kubernetes ephemeral volumes, sized specifically per workload, and nothing user-related persists on a pod or node once the process finishes.Sessions were restructured around stateless JWTs signed with rotating keys, removing the need for any server-side session store. Background jobs were split out into separate worker Deployments and Cronjobs so they could scale independently of the API layer. This made it much easier to handle I/O-heavy file operations without slowing down user-facing traffic.

Amazon S3 as File System of Record

S3 became the backbone of file storage. Prefix structures were carefully designed to prevent hot partitions; the pattern looks like bucket/tenant/{id}/yyyy/mm/dd/{uuid}. All data is encrypted at rest using AWS KMS and access to the bucket is locked down through a VPC endpoint with Block Public Access enforced.

Data lifecycle policies automatically transition older files to S3 Standard-IA and, later, to Glacier Instant Retrieval based on retention requirements. For files larger than 16 MB, multipart upload is used to boost throughput and reliability. From the application’s point of view, these uploads are atomic; a file either completes fully or not at all.

Each write to S3 includes a checksum, and the consumer validates it either by comparing the ETag or verifying the digest before committing metadata into Aurora.

AWS Secrets Manager and IRSA

All sensitive information like database credentials, API keys, and signing materials are now stored securely in AWS Secrets Manager. Each pod assumes a scoped IAM role through IRSA, which grants access only to the specific secret ARNs required for its function.

Secrets are fetched from Secrets Manager during pod startup and injected into the application before it begins serving traffic. This ensures the app always runs with the latest approved credentials while keeping runtime access minimal. Secrets are cached in memory for performance, and retry logic is built in to handle transient retrieval issues cleanly.

Rotations are managed through AWS Secrets Manager. When a secret is updated, the application picks up the new value automatically after a pod restarts or redeploys. Since no credentials are ever baked into build artifacts or container images, this change effectively removed an entire class of audit and compliance risks that existed in the legacy architecture.

Okta SSO (OIDC) Integration

Since Okta was already being used as the client’s SSO platform, the goal was to make the application integrate smoothly with their existing setup rather than introducing a new identity system. The app was configured as an OIDC confidential client, using Okta as the identity provider. It validates each incoming token by checking the issuer, audience, signature, and expiry before allowing access. Group claims from Okta map directly to roles inside the app, such as ROLE_ADMIN or ROLE_USER, which keeps access control simple and predictable.

The login flow uses the standard authorization code grant with PKCE, which is well-suited for browser-based clients. Token lifetimes are intentionally short, and refresh tokens are tied to both the user and the client, reducing the chance of misuse.

A small local cache stores Okta’s JWKs (public keys) so that token validation stays fast. The cache also detects when keys rotate and refreshes them automatically. If a verification fails, the app blocks access immediately; a fail-closed approach that adds another layer of safety.

Amazon Aurora MySQL Migration

The move to Aurora was one of the most sensitive parts of the modernization. We chose Aurora MySQL mainly for its managed high availability, automated failover, and simpler scaling compared to a self-managed MySQL setup. The cluster was provisioned in private subnets with Multi-AZ enabled, along with automated backups and point-in-time recovery.

After evaluating tools like DMS and native replication, we decided to go with a two-step dump and restore. The first run seeded Aurora early so we could test schema compatibility and performance. The second done on the day of cutover captured the latest data before switching over. This approach was straightforward, low-risk, and easy to assess.

Amazon EKS Deployment and Scaling

On the deployment side, each service has its own set of resource requests and limits, including ephemeral storage for file processing. Readiness and liveness probes were added so that traffic only reaches healthy pods, and if something goes wrong, Kubernetes can automatically restart or drain them.

Traffic comes in through an Application Load Balancer managed by the AWS Load Balancer Controller. It handles TLS termination and ties into AWS WAF for optional request filtering. This setup keeps the networking layer simple and consistent across environments.

Scaling is mostly hands-off now. The Horizontal Pod Autoscaler looks at both CPU usage and a custom metric that tracks the depth of the processing queue, so extra pods spin up only when they’re really needed. Behind the scenes, the Cluster Autoscaler keeps the node groups balanced and right-sized. Pod Disruption Budgets and topology-spread rules make sure workloads stay available across multiple Availability Zones, even during upgrades or node replacements.

Observability (Logs, Metrics, Traces)

Observability was built in from the start rather than added at the end. Since the application runs on the JVM, we wanted solid visibility into how it behaved under load things like garbage collection, memory usage, and thread activity.
Logs are structured and sent to CloudWatch, tagged with correlation IDs so individual requests and background jobs can be traced across different components. This made debugging much easier when something went wrong or performance dipped unexpectedly.

The client already used New Relic for monitoring, so we extended that setup instead of introducing new tools. The application and worker pods publish runtime and business metrics directly to New Relic, where dashboards show latency, throughput, and error trends in near real time. JVM metrics are also collected there, giving teams a single view of system health without jumping between tools.

For distributed tracing, New Relic APM captures traces from the API, worker jobs, and S3 or Aurora interactions. These traces help visualize how requests flow through the system and where bottlenecks occur. SLOs were defined around latency, error rate, and file-processing time, and alerts are routed through New Relic to the on-call team with direct links to runbooks for quick recovery.

Security and Network Hardening

Security was treated as part of the design, not an afterthought. Most internal traffic including calls from the ALB to the pods, now runs over TLS within the cluster. Both Aurora and S3 are accessed entirely through private networking, which keeps data paths off the public internet. Security groups were tightened so only the ALB can reach the application, and Kubernetes Network Policies limit communication between pods to just what’s necessary.

Container images are built from trusted base images and signed before they’re pushed. The registry only accepts verified images, so there’s a clear chain of provenance. IAM policies were reviewed and rewritten to follow strict least-privilege rules: each service can only reach the S3 prefixes and Secrets Manager ARNs it actually needs. KMS keys are tightly controlled, with rotation enabled and minimal grants to reduce exposure.

Operations, Cost and Reliability

Automation became one of the key themes during modernization. Most of the delivery pipeline was built around GitHub Actions and Argo CD, which made the whole process cleaner and easier to repeat. GitHub Actions handled the build workflow following a simple branching model. It built Docker images and pushed them to Amazon ECR using an IAM role connected through GitHub’s OIDC provider. This setup meant there were no long-lived credentials lying around, which was a big security win.

Once the images were in ECR, Argo CD picked it to handle deployments to Amazon EKS. We used Helm charts to keep configurations consistent across environments and to make rollbacks quick if we ever needed them.
Most deployments were done as rolling updates, so new versions went out gradually while old pods drained gracefully. It allowed updates without downtime, and if anything misbehaved, we could stop or roll back halfway through without causing a full outage. It felt like a good middle ground between caution and agility steady enough for production, but fast enough to keep delivery moving.

Cost management was baked in early. We used S3 lifecycle policies to move older files to cheaper storage, tuned node groups to avoid overprovisioning, and relied on VPC endpoints to avoid unnecessary NAT egress charges. Backups and restore procedures weren’t just set up, they were tested regularly. We made sure S3 version recovery and Aurora snapshot or PITR restores actually met the RTO and RPO targets on paper.

Outcomes and Lessons Learned

Resilience and Availability

Failures that once caused outages now barely register. If a node or pod fails, Kubernetes automatically replaces it, and the system keeps running without anyone noticing. Maintenance tasks like patching, upgrades, or restarts are all handled through rolling updates, so there’s no more late-night downtime windows.

Scalability and Performance

By splitting workloads into separate pods and externalizing files to Amazon S3, the platform now scales horizontally. File surges and user traffic no longer compete for the same resources, and batch jobs don’t slow down the UI. Autoscaling ensures the system reacts smoothly to changing load without manual intervention.

Security and Compliance

Security took a major step forward. Secrets are no longer embedded in code or configs they’re managed through AWS Secrets Manager with IAM-based access control. Authentication is fully centralized through Okta with MFA, and each action is traceable. Regular rotations, signed images, and least-privilege IAM policies have made the environment far more compliant and audit-friendly.

Operational Agility

Releases are now faster and safer. The CI/CD pipelines handle most of the heavy lifting, and if something goes wrong, rolling back is straightforward. Observability through CloudWatch and New Relic means issues can be spotted early, and alerts point directly to where they need attention.

Cost and Efficiency

Right-sizing nodes, S3 lifecycle policies, and VPC endpoints reduce operational costs without compromising performance. The system uses resources more efficiently, and storage costs are better controlled thanks to tiered S3 lifecycle management.

Team and Process Learnings

One of the biggest takeaways was cultural: moving to cloud-native workflows encouraged better collaboration between developers and operations. The team became more comfortable shipping changes quickly while keeping stability in check.