How to Scale a System to Millions of Users on AWS: A Step-by-Step Architectural Guide
Scale a system to millions of users on AWS by starting with a single-server baseline and incrementally adding CloudFront, load balancers, ElastiCache, read replicas, Auto Scaling groups, and DynamoDB while applying horizontal scaling and sharding patterns.
The donnemartin/system-design-primer repository provides a comprehensive, interview-tested roadmap for evolving web architectures from prototypes to high-traffic production systems. According to the scaling guide in solutions/system_design/scaling_aws/README.md, successful scaling requires a disciplined, stage-by-stage approach rather than premature optimization.
Start with a Single-Server Baseline
Before introducing complexity, establish a single-box baseline consisting of a web server running behind a public IP with DNS, and a single MySQL database instance storing all data. This minimal setup in solutions/system_design/scaling_aws/README.md allows you to validate core data models and user flows before distributing the architecture. Resist the urge to start with microservices; bottlenecks are easier to identify when they exist in one place.
The Six Stages of Scaling on AWS
The repository outlines six incremental scaling stages, each introducing specific AWS services to address emerging bottlenecks.
Stage 1: Users+ — Add a CDN
When traffic first grows, separate static assets from dynamic content. Deploy Amazon CloudFront to cache images, CSS, and JavaScript at edge locations. This reduces latency for global users and offloads request processing from your origin server.
Stage 2: Users++ — Load Balancing and Database Separation
Introduce an Application Load Balancer (ALB) or Elastic Load Balancer (ELB) to distribute incoming traffic across multiple EC2 instances. Split the monolith into distinct web and application layers, and migrate the database to Amazon RDS (MySQL) running on a dedicated instance. This stage improves availability and prevents the web server CPU from contending with database I/O.
Stage 3: Users+++ — Caching and Read Replicas
Database read pressure becomes the bottleneck. Implement ElastiCache (Redis or Memcached) using the cache-aside pattern to serve hot data without hitting the disk. Add RDS Read Replicas to offload read traffic from the primary database. Note that replication lag introduces eventual consistency, which your application must tolerate.
Stage 4: Users++++ — Auto Scaling and Cost Optimization
Replace static EC2 counts with Auto Scaling Groups that adjust capacity based on CloudWatch metrics like CPU utilization or request count. Use Spot Instances for non-critical workloads to reduce compute costs by up to 90%. This stage requires stateless application servers—session data must move to ElastiCache or a database, not local disk.
Stage 5: Users+++++ — Sharding and NoSQL
When write throughput exceeds single-node database limits, implement sharding or federation to partition data across multiple database instances by user ID or geographic region. Migrate high-write, low-latency data (like user posts or activity logs) to DynamoDB, a fully managed NoSQL store with auto-scaling throughput. Store immutable objects in Amazon S3, optionally with S3 Intelligent-Tiering to optimize storage costs.
Stage 6: Microservices and Async Processing
At massive scale, decompose the monolith into microservices using Amazon ECS or Amazon EKS. Decouple components with Amazon SQS for reliable queueing and Amazon SNS for pub/sub messaging. Implement service discovery with AWS Cloud Map to enable dynamic routing between services. Use Amazon Kinesis for real-time data streaming and Lambda for event-driven processing.
Core Design Patterns and Trade-offs
The solutions/system_design/scaling_aws/README.md file references these fundamental patterns under its Additional talking points section:
- Horizontal Scaling: Add more EC2 instances behind a load balancer to improve throughput and fault tolerance. Requires stateless services or shared external state stores.
- Read Replicas: Asynchronously replicate MySQL writes to secondary nodes. Offloads read traffic but introduces replication lag that may affect consistency-sensitive queries.
- Cache-Aside: Application code checks the cache first, then falls back to the database on miss. Reduces database hits but adds cache invalidation complexity.
- Sharding: Partition data across multiple database instances by a key like
user_id. Enables near-linear write scaling but increases operational complexity for cross-shard queries. - NoSQL (DynamoDB): Key-value storage with automatic partitioning. Handles massive write throughput and geographic distribution, but offers limited query flexibility compared to relational models.
Security, Monitoring, and Cost Controls
Production scaling requires operational rigor beyond pure architecture. Secure your tiers using IAM roles for EC2 instances, Security Groups for network segmentation, TLS termination at the load balancer, and AWS WAF for DDoS protection. Monitor system health with CloudWatch metrics, trace requests with X-Ray, and enforce compliance with AWS Config. Optimize costs by purchasing Reserved Instances for baseline capacity, using Spot Instances for fault-tolerant batch jobs, and applying S3 Intelligent-Tiering to automatically move infrequently accessed objects to cheaper storage classes.
Reference Architecture Diagram
The final architecture (Stage Users+++++) can be visualized as a data flow from client to persistence layer. Below is the Mermaid diagram representing the complete stack:
graph LR
subgraph Client
Browser[Browser / Mobile App]
end
subgraph DNS
Route53[Amazon Route 53]
end
subgraph Edge
CloudFront[Amazon CloudFront CDN]
end
subgraph LoadBalancing
ALB[Application Load Balancer]
end
subgraph Compute
EC2[Auto-Scaling EC2 (Web / API)]
ECS[Amazon ECS (Micro-services)]
end
subgraph Cache
Redis[ElastiCache Redis]
end
subgraph Datastore
RDS[Amazon RDS (MySQL) – Master]
Replicas[Read Replicas]
Dynamo[DynamoDB (User-Posts)]
S3[Amazon S3 (Objects)]
end
subgraph Async
SQS[Amazon SQS Queue]
Lambda[Lambda Workers]
end
Browser -->|HTTPS| Route53 --> CloudFront --> ALB --> EC2 --> Redis
EC2 -->|Read| Replicas
EC2 -->|Write| RDS
EC2 -->|Write| Dynamo
EC2 -->|Publish| S3
EC2 -->|Enqueue| SQS --> Lambda --> Dynamo
Infrastructure as Code Template
To deploy the Stage 2 (Users++) architecture programmatically, use the following AWS CloudFormation template. It provisions a VPC, Application Load Balancer, Auto Scaling Group, and RDS MySQL instance:
AWSTemplateFormatVersion: '2010-09-09'
Description: Basic scaling architecture for millions of users (Stage Users++)
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsSupport: true
EnableDnsHostnames: true
PublicSubnet:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
ALB:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Subnets: [!Ref PublicSubnet]
SecurityGroups: [] # add SGs later
Scheme: internet-facing
Type: application
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
VpcId: !Ref VPC
Port: 80
Protocol: HTTP
HealthCheckPath: /
TargetType: instance
Listener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
LoadBalancerArn: !Ref ALB
Port: 80
Protocol: HTTP
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
LaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateData:
ImageId: ami-0c02fb55956c7d316 # Amazon Linux 2 (update as needed)
InstanceType: t3.medium
SecurityGroupIds: [] # add SGs
UserData: |
#!/bin/bash
yum install -y httpd
systemctl enable httpd
systemctl start httpd
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier: [!Ref PublicSubnet]
LaunchTemplate:
LaunchTemplateId: !Ref LaunchTemplate
Version: '$Latest'
MinSize: 2
MaxSize: 10
DesiredCapacity: 2
TargetGroupARNs: [!Ref TargetGroup]
MySQLInstance:
Type: AWS::RDS::DBInstance
Properties:
Engine: mysql
DBInstanceClass: db.t3.medium
AllocatedStorage: 20
MasterUsername: admin
MasterUserPassword: ChangeMe123! # replace with Secrets Manager in prod
VPCSecurityGroups: [] # add SGs
DBSubnetGroupName: !Ref DBSubnetGroup
DBSubnetGroup:
Type: AWS::RDS::DBSubnetGroup
Properties:
DBSubnetGroupDescription: Subnet group for RDS
SubnetIds: [!Ref PublicSubnet] # normally use private subnets
Outputs:
LoadBalancerDNS:
Description: DNS name of the ALB
Value: !GetAtt ALB.DNSName
RDSEndpoint:
Description: Endpoint for the MySQL DB
Value: !GetAtt MySQLInstance.Endpoint.Address
In production, extend this template with private subnets for databases, IAM instance profiles, Secrets Manager for credentials, Auto Scaling policies based on CloudWatch alarms, and Security Groups restricting traffic between tiers.
Summary
- Start with a single-box prototype to validate your data model before distributing complexity.
- Scale incrementally through six stages: CDN → Load Balancer → Caching/Read Replicas → Auto Scaling → Sharding/NoSQL → Microservices.
- Apply horizontal scaling, cache-aside, and sharding patterns while maintaining stateless application servers.
- Secure the architecture with IAM, Security Groups, and TLS, and monitor with CloudWatch and X-Ray.
- Use Infrastructure as Code (CloudFormation or Terraform) to version control your scaling configuration.
Frequently Asked Questions
What is the first AWS service I should add when scaling beyond a single server?
According to the solutions/system_design/scaling_aws/README.md guide, the first service to add is Amazon CloudFront (CDN). Separating static assets and serving them from edge locations reduces origin server load and improves global latency before you invest in complex compute scaling.
When should I migrate from Amazon RDS to DynamoDB?
Migrate to DynamoDB when you reach Stage 5 (Users+++++) and your MySQL master cannot handle the write throughput despite read replicas and caching. DynamoDB's auto-scaling partitions handle massive concurrent writes with single-digit millisecond latency, though you must accept limited query flexibility and eventual consistency models.
How do I handle user sessions when scaling horizontally with Auto Scaling?
Store session state outside the EC2 instances using ElastiCache (Redis) or a persistent database like DynamoDB. The Auto Scaling Group replaces instances dynamically, so local disk storage is ephemeral. A centralized session store ensures users remain authenticated regardless of which server handles their request.
What is the difference between read replicas and sharding?
Read replicas create copies of your entire database to offload read traffic, but all writes still hit the single master node. Sharding partitions the data itself across multiple independent databases (e.g., users A-M on Node 1, N-Z on Node 2), allowing writes to scale horizontally but requiring application logic to route queries to the correct shard.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →