How to Design a Key-Value Store for High Performance: Architecture and Implementation Guide
Designing a high-performance key-value store requires combining an in-memory LRU cache for sub-millisecond latency with consistent hashing for horizontal scaling, backed by a durable persistence layer using write-ahead logging and replication. This architectural pattern, as detailed in the donnemartin/system-design-primer repository, enables systems to handle billions of queries per month while maintaining availability and fault tolerance.
To design a key-value store for high performance, you must balance latency, throughput, and consistency. The system-design-primer repository provides concrete implementations and design patterns showing how to build such systems using an LRU cache backed by distributed storage.
Core Architecture Components
A production-ready key-value store separates concerns across multiple layers. According to the query-cache solution in solutions/system_design/query_cache/README.md, the high-level flow follows: Client → Load Balancer → API Servers → Memory Cache ↔ Persistent Store.
In-Memory Cache Layer
The cache layer provides O(1) reads and writes using a hash table combined with a doubly-linked list to maintain LRU (Least Recently Used) order. This data structure, implemented in the repository's Python examples, tracks access patterns to ensure hot keys remain in memory while cold data gets evicted efficiently.
When memory limits are reached, the system must evict stale entries. The LRU policy removes the least recently accessed item, keeping the working set in fast memory. This is critical for achieving sub-millisecond read latency on cache hits.
Persistent Storage Backend
The durable layer handles cache misses and serves as the source of truth. As described in README.md, this layer requires write-ahead logging (WAL) for crash recovery and replica sets for high availability. Primary-secondary or multi-master replication patterns ensure data survives node failures.
Implementing the LRU Cache
The repository provides a concrete Python implementation in solutions/system_design/query_cache/README.md demonstrating the core data structures. The design uses three main classes: Node for entries, LinkedList for LRU ordering, and Cache for the hash table interface.
class Node:
def __init__(self, key, value):
self.key = key
self.value = value
self.prev = None
self.next = None
class LinkedList:
"""Doubly linked list to maintain LRU order."""
def __init__(self):
self.head = None
self.tail = None
def move_to_front(self, node):
if node is self.head:
return
# Detach
if node.prev:
node.prev.next = node.next
if node.next:
node.next.prev = node.prev
if node is self.tail:
self.tail = node.prev
# Insert at front
node.prev = None
node.next = self.head
if self.head:
self.head.prev = node
self.head = node
if self.tail is None:
self.tail = node
def append_to_front(self, node):
node.prev = None
node.next = self.head
if self.head:
self.head.prev = node
self.head = node
if self.tail is None:
self.tail = node
def remove_from_tail(self):
if not self.tail:
return
old_tail = self.tail
if old_tail.prev:
self.tail = old_tail.prev
self.tail.next = None
else:
self.head = self.tail = None
return old_tail
class Cache:
"""Fixed-size LRU cache."""
def __init__(self, max_size):
self.max_size = max_size
self.size = 0
self.table = {} # key → Node
self.lru = LinkedList()
def get(self, key):
node = self.table.get(key)
if not node:
return None
self.lru.move_to_front(node)
return node.value
def set(self, key, value):
node = self.table.get(key)
if node:
node.value = value
self.lru.move_to_front(node)
return
if self.size == self.max_size:
evicted = self.lru.remove_from_tail()
if evicted:
self.table.pop(evicted.key)
self.size -= 1
new_node = Node(key, value)
self.lru.append_to_front(new_node)
self.table[key] = new_node
self.size += 1
This implementation stores key-value pairs in a hash table (self.table) for O(1) access while maintaining a doubly-linked list to track usage order. When the cache reaches max_size, remove_from_tail() evicts the oldest entry in constant time.
Scaling with Consistent Hashing
To handle datasets larger than single-node memory, you must shard data across multiple cache instances. The README.md section on sharding recommends consistent hashing to distribute keys across nodes.
Consistent hashing minimizes reorganization when nodes are added or removed. Instead of modulo-based hashing that remaps most keys, consistent hashing maps both nodes and keys to a ring, allowing only adjacent keys to move when topology changes. This prevents hot-spotting and enables horizontal scalability.
Three strategies exist for expanding the memory cache to many machines:
- Independent caches per node — Simplest implementation but results in low hit-rates due to duplicated cold data across nodes.
- Full replication — High hit-rate since every node holds all data, but inefficient memory usage limits total dataset size.
- Sharded cache — Best trade-off where each node holds a distinct key-range, using consistent hashing to locate the correct node for any given key.
Consistency Patterns and Replication
The README.md file describes several patterns for synchronizing the cache with persistent storage.
Cache-aside (or lazy loading) remains the most common pattern for read-heavy workloads. The application checks the cache first; on miss, it queries the datastore, populates the cache, and returns the result. This prevents unnecessary cache writes and handles stale data gracefully.
For write strategies, you can implement write-through (synchronous update to cache and store) or write-behind (asynchronous batch updates to the store). Most high-performance key-value stores accept eventual consistency, allowing updates to propagate within bounded time rather than requiring immediate synchronization across all replicas.
Replication strategies from README.md include:
- Primary-secondary replication — One node handles writes, replicas serve reads, automatic failover on primary failure.
- Multi-master replication — Multiple nodes accept writes, requiring conflict resolution but offering higher write availability.
Production Deployment Considerations
Deploying a high-performance key-value store requires operational safeguards. According to the system design primer, implement back-pressure to throttle client requests when the cache or datastore approaches saturation, preventing cascading failures.
Monitoring metrics must track cache hit-rate, latency percentiles, eviction counts, and node resource utilization. Auto-scaling groups should add or remove cache nodes based on CPU and memory thresholds, while maintaining session affinity where required.
Summary
- Combine hash tables with doubly-linked lists to achieve O(1) LRU cache operations as shown in
solutions/system_design/query_cache/README.md. - Use consistent hashing to shard data across nodes, enabling horizontal scaling without massive key remapping.
- Implement cache-aside pattern for efficient read-heavy workloads, falling back to durable storage on misses.
- Deploy write-ahead logging and replication in the persistence layer to ensure durability and high availability.
- Monitor hit-rates and latency to trigger auto-scaling and maintain sub-millisecond performance under load.
Frequently Asked Questions
What data structure provides O(1) lookups in a high-performance key-value store?
A hash table provides constant-time lookups for key existence and value retrieval. When combined with a doubly-linked list to track access order, as implemented in the Cache class from solutions/system_design/query_cache/README.md, you achieve O(1) performance for both lookups and LRU evictions.
How does consistent hashing help scale a key-value store?
Consistent hashing maps both cache nodes and data keys to a circular space or "ring." When adding or removing nodes, only the keys adjacent to the changed node need remapping, rather than rehashing the entire dataset. This minimizes disruption during scaling events and prevents the "thundering herd" problem during cache rebuilds.
What is the difference between cache-aside and write-through patterns?
Cache-aside loads data into the cache only on read misses, keeping the cache filled with actually requested data. Write-through synchronously writes to both cache and persistent storage on every update, ensuring consistency but adding latency. Cache-aside suits read-heavy workloads with eventual consistency requirements, while write-through fits scenarios requiring strict consistency between cache and store.
How do you handle cache eviction in a memory-constrained environment?
Implement an LRU (Least Recently Used) eviction policy using a combination of a hash table and doubly-linked list. When the cache reaches capacity, remove the tail node (oldest accessed item) in O(1) time and delete its hash table entry. This ensures frequently accessed "hot" keys remain in fast memory while rarely used "cold" data gets discarded efficiently.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →