# How to Scale a System to Millions of Users on AWS: A Step-by-Step Architectural Guide

> Learn how to scale a system to millions of users on AWS with this step-by-step architectural guide. Discover strategies for incremental growth and robust performance.

- Repository: [Donne Martin/system-design-primer](https://github.com/donnemartin/system-design-primer)
- Tags: architecture
- Published: 2026-02-24

---

**Scale a system to millions of users on AWS by starting with a single-server baseline and incrementally adding CloudFront, load balancers, ElastiCache, read replicas, Auto Scaling groups, and DynamoDB while applying horizontal scaling and sharding patterns.**

The `donnemartin/system-design-primer` repository provides a comprehensive, interview-tested roadmap for evolving web architectures from prototypes to high-traffic production systems. According to the scaling guide in [`solutions/system_design/scaling_aws/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/scaling_aws/README.md), successful scaling requires a disciplined, stage-by-stage approach rather than premature optimization.

## Start with a Single-Server Baseline

Before introducing complexity, establish a **single-box baseline** consisting of a web server running behind a public IP with DNS, and a single **MySQL** database instance storing all data. This minimal setup in [`solutions/system_design/scaling_aws/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/scaling_aws/README.md) allows you to validate core data models and user flows before distributing the architecture. Resist the urge to start with microservices; bottlenecks are easier to identify when they exist in one place.

## The Six Stages of Scaling on AWS

The repository outlines six incremental scaling stages, each introducing specific AWS services to address emerging bottlenecks.

### Stage 1: Users+ — Add a CDN

When traffic first grows, separate static assets from dynamic content. Deploy **Amazon CloudFront** to cache images, CSS, and JavaScript at edge locations. This reduces latency for global users and offloads request processing from your origin server.

### Stage 2: Users++ — Load Balancing and Database Separation

Introduce an **Application Load Balancer (ALB)** or **Elastic Load Balancer (ELB)** to distribute incoming traffic across multiple **EC2** instances. Split the monolith into distinct web and application layers, and migrate the database to **Amazon RDS (MySQL)** running on a dedicated instance. This stage improves availability and prevents the web server CPU from contending with database I/O.

### Stage 3: Users+++ — Caching and Read Replicas

Database read pressure becomes the bottleneck. Implement **ElastiCache** (Redis or Memcached) using the **cache-aside pattern** to serve hot data without hitting the disk. Add **RDS Read Replicas** to offload read traffic from the primary database. Note that replication lag introduces eventual consistency, which your application must tolerate.

### Stage 4: Users++++ — Auto Scaling and Cost Optimization

Replace static EC2 counts with **Auto Scaling Groups** that adjust capacity based on CloudWatch metrics like CPU utilization or request count. Use **Spot Instances** for non-critical workloads to reduce compute costs by up to 90%. This stage requires stateless application servers—session data must move to ElastiCache or a database, not local disk.

### Stage 5: Users+++++ — Sharding and NoSQL

When write throughput exceeds single-node database limits, implement **sharding** or **federation** to partition data across multiple database instances by user ID or geographic region. Migrate high-write, low-latency data (like user posts or activity logs) to **DynamoDB**, a fully managed NoSQL store with auto-scaling throughput. Store immutable objects in **Amazon S3**, optionally with **S3 Intelligent-Tiering** to optimize storage costs.

### Stage 6: Microservices and Async Processing

At massive scale, decompose the monolith into **microservices** using **Amazon ECS** or **Amazon EKS**. Decouple components with **Amazon SQS** for reliable queueing and **Amazon SNS** for pub/sub messaging. Implement service discovery with **AWS Cloud Map** to enable dynamic routing between services. Use **Amazon Kinesis** for real-time data streaming and **Lambda** for event-driven processing.

## Core Design Patterns and Trade-offs

The [`solutions/system_design/scaling_aws/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/scaling_aws/README.md) file references these fundamental patterns under its *Additional talking points* section:

- **Horizontal Scaling**: Add more EC2 instances behind a load balancer to improve throughput and fault tolerance. Requires stateless services or shared external state stores.
- **Read Replicas**: Asynchronously replicate MySQL writes to secondary nodes. Offloads read traffic but introduces replication lag that may affect consistency-sensitive queries.
- **Cache-Aside**: Application code checks the cache first, then falls back to the database on miss. Reduces database hits but adds cache invalidation complexity.
- **Sharding**: Partition data across multiple database instances by a key like `user_id`. Enables near-linear write scaling but increases operational complexity for cross-shard queries.
- **NoSQL (DynamoDB)**: Key-value storage with automatic partitioning. Handles massive write throughput and geographic distribution, but offers limited query flexibility compared to relational models.

## Security, Monitoring, and Cost Controls

Production scaling requires operational rigor beyond pure architecture. Secure your tiers using **IAM roles** for EC2 instances, **Security Groups** for network segmentation, **TLS termination** at the load balancer, and **AWS WAF** for DDoS protection. Monitor system health with **CloudWatch** metrics, trace requests with **X-Ray**, and enforce compliance with **AWS Config**. Optimize costs by purchasing **Reserved Instances** for baseline capacity, using **Spot Instances** for fault-tolerant batch jobs, and applying **S3 Intelligent-Tiering** to automatically move infrequently accessed objects to cheaper storage classes.

## Reference Architecture Diagram

The final architecture (Stage Users+++++) can be visualized as a data flow from client to persistence layer. Below is the Mermaid diagram representing the complete stack:

```mermaid
graph LR
    subgraph Client
        Browser[Browser / Mobile App]
    end
    subgraph DNS
        Route53[Amazon Route 53]
    end
    subgraph Edge
        CloudFront[Amazon CloudFront CDN]
    end
    subgraph LoadBalancing
        ALB[Application Load Balancer]
    end
    subgraph Compute
        EC2[Auto-Scaling EC2 (Web / API)]
        ECS[Amazon ECS (Micro-services)]
    end
    subgraph Cache
        Redis[ElastiCache Redis]
    end
    subgraph Datastore
        RDS[Amazon RDS (MySQL) – Master]
        Replicas[Read Replicas]
        Dynamo[DynamoDB (User-Posts)]
        S3[Amazon S3 (Objects)]
    end
    subgraph Async
        SQS[Amazon SQS Queue]
        Lambda[Lambda Workers]
    end

    Browser -->|HTTPS| Route53 --> CloudFront --> ALB --> EC2 --> Redis
    EC2 -->|Read| Replicas
    EC2 -->|Write| RDS
    EC2 -->|Write| Dynamo
    EC2 -->|Publish| S3
    EC2 -->|Enqueue| SQS --> Lambda --> Dynamo

```

## Infrastructure as Code Template

To deploy the Stage 2 (Users++) architecture programmatically, use the following **AWS CloudFormation** template. It provisions a VPC, Application Load Balancer, Auto Scaling Group, and RDS MySQL instance:

```yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: Basic scaling architecture for millions of users (Stage Users++)

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true

  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']

  ALB:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Subnets: [!Ref PublicSubnet]
      SecurityGroups: []      # add SGs later

      Scheme: internet-facing
      Type: application

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      VpcId: !Ref VPC
      Port: 80
      Protocol: HTTP
      HealthCheckPath: /
      TargetType: instance

  Listener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      LoadBalancerArn: !Ref ALB
      Port: 80
      Protocol: HTTP
      DefaultActions:
        - Type: forward
          TargetGroupArn: !Ref TargetGroup

  LaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        ImageId: ami-0c02fb55956c7d316   # Amazon Linux 2 (update as needed)

        InstanceType: t3.medium
        SecurityGroupIds: []            # add SGs

        UserData: |
          #!/bin/bash
          yum install -y httpd
          systemctl enable httpd
          systemctl start httpd

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      VPCZoneIdentifier: [!Ref PublicSubnet]
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: '$Latest'
      MinSize: 2
      MaxSize: 10
      DesiredCapacity: 2
      TargetGroupARNs: [!Ref TargetGroup]

  MySQLInstance:
    Type: AWS::RDS::DBInstance
    Properties:
      Engine: mysql
      DBInstanceClass: db.t3.medium
      AllocatedStorage: 20
      MasterUsername: admin
      MasterUserPassword: ChangeMe123!   # replace with Secrets Manager in prod

      VPCSecurityGroups: []             # add SGs

      DBSubnetGroupName: !Ref DBSubnetGroup

  DBSubnetGroup:
    Type: AWS::RDS::DBSubnetGroup
    Properties:
      DBSubnetGroupDescription: Subnet group for RDS
      SubnetIds: [!Ref PublicSubnet]    # normally use private subnets

Outputs:
  LoadBalancerDNS:
    Description: DNS name of the ALB
    Value: !GetAtt ALB.DNSName
  RDSEndpoint:
    Description: Endpoint for the MySQL DB
    Value: !GetAtt MySQLInstance.Endpoint.Address

```

In production, extend this template with **private subnets** for databases, **IAM instance profiles**, **Secrets Manager** for credentials, **Auto Scaling policies** based on CloudWatch alarms, and **Security Groups** restricting traffic between tiers.

## Summary

- Start with a **single-box prototype** to validate your data model before distributing complexity.
- Scale incrementally through six stages: CDN → Load Balancer → Caching/Read Replicas → Auto Scaling → Sharding/NoSQL → Microservices.
- Apply **horizontal scaling**, **cache-aside**, and **sharding** patterns while maintaining stateless application servers.
- Secure the architecture with **IAM**, **Security Groups**, and **TLS**, and monitor with **CloudWatch** and **X-Ray**.
- Use **Infrastructure as Code** (CloudFormation or Terraform) to version control your scaling configuration.

## Frequently Asked Questions

### What is the first AWS service I should add when scaling beyond a single server?

According to the [`solutions/system_design/scaling_aws/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/scaling_aws/README.md) guide, the first service to add is **Amazon CloudFront** (CDN). Separating static assets and serving them from edge locations reduces origin server load and improves global latency before you invest in complex compute scaling.

### When should I migrate from Amazon RDS to DynamoDB?

Migrate to **DynamoDB** when you reach Stage 5 (Users+++++) and your MySQL master cannot handle the write throughput despite read replicas and caching. DynamoDB's auto-scaling partitions handle massive concurrent writes with single-digit millisecond latency, though you must accept limited query flexibility and eventual consistency models.

### How do I handle user sessions when scaling horizontally with Auto Scaling?

Store session state outside the EC2 instances using **ElastiCache (Redis)** or a persistent database like **DynamoDB**. The Auto Scaling Group replaces instances dynamically, so local disk storage is ephemeral. A centralized session store ensures users remain authenticated regardless of which server handles their request.

### What is the difference between read replicas and sharding?

**Read replicas** create copies of your entire database to offload read traffic, but all writes still hit the single master node. **Sharding** partitions the data itself across multiple independent databases (e.g., users A-M on Node 1, N-Z on Node 2), allowing writes to scale horizontally but requiring application logic to route queries to the correct shard.