How to Scale a System to Millions of Users on AWS: A Step-by-Step Architectural Guide

Scale a system to millions of users on AWS by starting with a single-server baseline and incrementally adding CloudFront, load balancers, ElastiCache, read replicas, Auto Scaling groups, and DynamoDB while applying horizontal scaling and sharding patterns.

The donnemartin/system-design-primer repository provides a comprehensive, interview-tested roadmap for evolving web architectures from prototypes to high-traffic production systems. According to the scaling guide in solutions/system_design/scaling_aws/README.md, successful scaling requires a disciplined, stage-by-stage approach rather than premature optimization.

Start with a Single-Server Baseline

Before introducing complexity, establish a single-box baseline consisting of a web server running behind a public IP with DNS, and a single MySQL database instance storing all data. This minimal setup in solutions/system_design/scaling_aws/README.md allows you to validate core data models and user flows before distributing the architecture. Resist the urge to start with microservices; bottlenecks are easier to identify when they exist in one place.

The Six Stages of Scaling on AWS

The repository outlines six incremental scaling stages, each introducing specific AWS services to address emerging bottlenecks.

Stage 1: Users+ — Add a CDN

When traffic first grows, separate static assets from dynamic content. Deploy Amazon CloudFront to cache images, CSS, and JavaScript at edge locations. This reduces latency for global users and offloads request processing from your origin server.

Stage 2: Users++ — Load Balancing and Database Separation

Introduce an Application Load Balancer (ALB) or Elastic Load Balancer (ELB) to distribute incoming traffic across multiple EC2 instances. Split the monolith into distinct web and application layers, and migrate the database to Amazon RDS (MySQL) running on a dedicated instance. This stage improves availability and prevents the web server CPU from contending with database I/O.

Stage 3: Users+++ — Caching and Read Replicas

Database read pressure becomes the bottleneck. Implement ElastiCache (Redis or Memcached) using the cache-aside pattern to serve hot data without hitting the disk. Add RDS Read Replicas to offload read traffic from the primary database. Note that replication lag introduces eventual consistency, which your application must tolerate.

Stage 4: Users++++ — Auto Scaling and Cost Optimization

Replace static EC2 counts with Auto Scaling Groups that adjust capacity based on CloudWatch metrics like CPU utilization or request count. Use Spot Instances for non-critical workloads to reduce compute costs by up to 90%. This stage requires stateless application servers—session data must move to ElastiCache or a database, not local disk.

Stage 5: Users+++++ — Sharding and NoSQL

When write throughput exceeds single-node database limits, implement sharding or federation to partition data across multiple database instances by user ID or geographic region. Migrate high-write, low-latency data (like user posts or activity logs) to DynamoDB, a fully managed NoSQL store with auto-scaling throughput. Store immutable objects in Amazon S3, optionally with S3 Intelligent-Tiering to optimize storage costs.

Stage 6: Microservices and Async Processing

At massive scale, decompose the monolith into microservices using Amazon ECS or Amazon EKS. Decouple components with Amazon SQS for reliable queueing and Amazon SNS for pub/sub messaging. Implement service discovery with AWS Cloud Map to enable dynamic routing between services. Use Amazon Kinesis for real-time data streaming and Lambda for event-driven processing.

Core Design Patterns and Trade-offs

The solutions/system_design/scaling_aws/README.md file references these fundamental patterns under its Additional talking points section:

  • Horizontal Scaling: Add more EC2 instances behind a load balancer to improve throughput and fault tolerance. Requires stateless services or shared external state stores.
  • Read Replicas: Asynchronously replicate MySQL writes to secondary nodes. Offloads read traffic but introduces replication lag that may affect consistency-sensitive queries.
  • Cache-Aside: Application code checks the cache first, then falls back to the database on miss. Reduces database hits but adds cache invalidation complexity.
  • Sharding: Partition data across multiple database instances by a key like user_id. Enables near-linear write scaling but increases operational complexity for cross-shard queries.
  • NoSQL (DynamoDB): Key-value storage with automatic partitioning. Handles massive write throughput and geographic distribution, but offers limited query flexibility compared to relational models.

Security, Monitoring, and Cost Controls

Production scaling requires operational rigor beyond pure architecture. Secure your tiers using IAM roles for EC2 instances, Security Groups for network segmentation, TLS termination at the load balancer, and AWS WAF for DDoS protection. Monitor system health with CloudWatch metrics, trace requests with X-Ray, and enforce compliance with AWS Config. Optimize costs by purchasing Reserved Instances for baseline capacity, using Spot Instances for fault-tolerant batch jobs, and applying S3 Intelligent-Tiering to automatically move infrequently accessed objects to cheaper storage classes.

Reference Architecture Diagram

The final architecture (Stage Users+++++) can be visualized as a data flow from client to persistence layer. Below is the Mermaid diagram representing the complete stack:

graph LR
    subgraph Client
        Browser[Browser / Mobile App]
    end
    subgraph DNS
        Route53[Amazon Route 53]
    end
    subgraph Edge
        CloudFront[Amazon CloudFront CDN]
    end
    subgraph LoadBalancing
        ALB[Application Load Balancer]
    end
    subgraph Compute
        EC2[Auto-Scaling EC2 (Web / API)]
        ECS[Amazon ECS (Micro-services)]
    end
    subgraph Cache
        Redis[ElastiCache Redis]
    end
    subgraph Datastore
        RDS[Amazon RDS (MySQL) – Master]
        Replicas[Read Replicas]
        Dynamo[DynamoDB (User-Posts)]
        S3[Amazon S3 (Objects)]
    end
    subgraph Async
        SQS[Amazon SQS Queue]
        Lambda[Lambda Workers]
    end

    Browser -->|HTTPS| Route53 --> CloudFront --> ALB --> EC2 --> Redis
    EC2 -->|Read| Replicas
    EC2 -->|Write| RDS
    EC2 -->|Write| Dynamo
    EC2 -->|Publish| S3
    EC2 -->|Enqueue| SQS --> Lambda --> Dynamo

Infrastructure as Code Template

To deploy the Stage 2 (Users++) architecture programmatically, use the following AWS CloudFormation template. It provisions a VPC, Application Load Balancer, Auto Scaling Group, and RDS MySQL instance:

AWSTemplateFormatVersion: '2010-09-09'
Description: Basic scaling architecture for millions of users (Stage Users++)

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true

  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']

  ALB:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Subnets: [!Ref PublicSubnet]
      SecurityGroups: []      # add SGs later

      Scheme: internet-facing
      Type: application

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      VpcId: !Ref VPC
      Port: 80
      Protocol: HTTP
      HealthCheckPath: /
      TargetType: instance

  Listener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      LoadBalancerArn: !Ref ALB
      Port: 80
      Protocol: HTTP
      DefaultActions:
        - Type: forward
          TargetGroupArn: !Ref TargetGroup

  LaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        ImageId: ami-0c02fb55956c7d316   # Amazon Linux 2 (update as needed)

        InstanceType: t3.medium
        SecurityGroupIds: []            # add SGs

        UserData: |
          #!/bin/bash
          yum install -y httpd
          systemctl enable httpd
          systemctl start httpd

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      VPCZoneIdentifier: [!Ref PublicSubnet]
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: '$Latest'
      MinSize: 2
      MaxSize: 10
      DesiredCapacity: 2
      TargetGroupARNs: [!Ref TargetGroup]

  MySQLInstance:
    Type: AWS::RDS::DBInstance
    Properties:
      Engine: mysql
      DBInstanceClass: db.t3.medium
      AllocatedStorage: 20
      MasterUsername: admin
      MasterUserPassword: ChangeMe123!   # replace with Secrets Manager in prod

      VPCSecurityGroups: []             # add SGs

      DBSubnetGroupName: !Ref DBSubnetGroup

  DBSubnetGroup:
    Type: AWS::RDS::DBSubnetGroup
    Properties:
      DBSubnetGroupDescription: Subnet group for RDS
      SubnetIds: [!Ref PublicSubnet]    # normally use private subnets

Outputs:
  LoadBalancerDNS:
    Description: DNS name of the ALB
    Value: !GetAtt ALB.DNSName
  RDSEndpoint:
    Description: Endpoint for the MySQL DB
    Value: !GetAtt MySQLInstance.Endpoint.Address

In production, extend this template with private subnets for databases, IAM instance profiles, Secrets Manager for credentials, Auto Scaling policies based on CloudWatch alarms, and Security Groups restricting traffic between tiers.

Summary

  • Start with a single-box prototype to validate your data model before distributing complexity.
  • Scale incrementally through six stages: CDN → Load Balancer → Caching/Read Replicas → Auto Scaling → Sharding/NoSQL → Microservices.
  • Apply horizontal scaling, cache-aside, and sharding patterns while maintaining stateless application servers.
  • Secure the architecture with IAM, Security Groups, and TLS, and monitor with CloudWatch and X-Ray.
  • Use Infrastructure as Code (CloudFormation or Terraform) to version control your scaling configuration.

Frequently Asked Questions

What is the first AWS service I should add when scaling beyond a single server?

According to the solutions/system_design/scaling_aws/README.md guide, the first service to add is Amazon CloudFront (CDN). Separating static assets and serving them from edge locations reduces origin server load and improves global latency before you invest in complex compute scaling.

When should I migrate from Amazon RDS to DynamoDB?

Migrate to DynamoDB when you reach Stage 5 (Users+++++) and your MySQL master cannot handle the write throughput despite read replicas and caching. DynamoDB's auto-scaling partitions handle massive concurrent writes with single-digit millisecond latency, though you must accept limited query flexibility and eventual consistency models.

How do I handle user sessions when scaling horizontally with Auto Scaling?

Store session state outside the EC2 instances using ElastiCache (Redis) or a persistent database like DynamoDB. The Auto Scaling Group replaces instances dynamically, so local disk storage is ephemeral. A centralized session store ensures users remain authenticated regardless of which server handles their request.

What is the difference between read replicas and sharding?

Read replicas create copies of your entire database to offload read traffic, but all writes still hit the single master node. Sharding partitions the data itself across multiple independent databases (e.g., users A-M on Node 1, N-Z on Node 2), allowing writes to scale horizontally but requiring application logic to route queries to the correct shard.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →