# How to Design a Key-Value Store for High Performance: Architecture and Implementation Guide

> Learn to design a high-performance key-value store using in-memory caching, consistent hashing, and durable persistence. Explore essential architecture and implementation details for massive scalability and fault tolerance.

- Repository: [Donne Martin/system-design-primer](https://github.com/donnemartin/system-design-primer)
- Tags: how-to-guide
- Published: 2026-02-24

---

**Designing a high-performance key-value store requires combining an in-memory LRU cache for sub-millisecond latency with consistent hashing for horizontal scaling, backed by a durable persistence layer using write-ahead logging and replication.** This architectural pattern, as detailed in the `donnemartin/system-design-primer` repository, enables systems to handle billions of queries per month while maintaining availability and fault tolerance.

To design a key-value store for high performance, you must balance latency, throughput, and consistency. The `system-design-primer` repository provides concrete implementations and design patterns showing how to build such systems using an LRU cache backed by distributed storage.

## Core Architecture Components

A production-ready key-value store separates concerns across multiple layers. According to the query-cache solution in [`solutions/system_design/query_cache/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/query_cache/README.md), the high-level flow follows: **Client → Load Balancer → API Servers → Memory Cache ↔ Persistent Store**.

### In-Memory Cache Layer

The cache layer provides **O(1) reads and writes** using a hash table combined with a doubly-linked list to maintain LRU (Least Recently Used) order. This data structure, implemented in the repository's Python examples, tracks access patterns to ensure hot keys remain in memory while cold data gets evicted efficiently.

When memory limits are reached, the system must evict stale entries. The LRU policy removes the least recently accessed item, keeping the working set in fast memory. This is critical for achieving sub-millisecond read latency on cache hits.

### Persistent Storage Backend

The durable layer handles cache misses and serves as the source of truth. As described in [`README.md`](https://github.com/donnemartin/system-design-primer/blob/main/README.md), this layer requires **write-ahead logging (WAL)** for crash recovery and **replica sets** for high availability. Primary-secondary or multi-master replication patterns ensure data survives node failures.

## Implementing the LRU Cache

The repository provides a concrete Python implementation in [`solutions/system_design/query_cache/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/query_cache/README.md) demonstrating the core data structures. The design uses three main classes: `Node` for entries, `LinkedList` for LRU ordering, and `Cache` for the hash table interface.

```python
class Node:
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.prev = None
        self.next = None


class LinkedList:
    """Doubly linked list to maintain LRU order."""
    def __init__(self):
        self.head = None
        self.tail = None

    def move_to_front(self, node):
        if node is self.head:
            return
        # Detach

        if node.prev:
            node.prev.next = node.next
        if node.next:
            node.next.prev = node.prev
        if node is self.tail:
            self.tail = node.prev
        # Insert at front

        node.prev = None
        node.next = self.head
        if self.head:
            self.head.prev = node
        self.head = node
        if self.tail is None:
            self.tail = node

    def append_to_front(self, node):
        node.prev = None
        node.next = self.head
        if self.head:
            self.head.prev = node
        self.head = node
        if self.tail is None:
            self.tail = node

    def remove_from_tail(self):
        if not self.tail:
            return
        old_tail = self.tail
        if old_tail.prev:
            self.tail = old_tail.prev
            self.tail.next = None
        else:
            self.head = self.tail = None
        return old_tail


class Cache:
    """Fixed-size LRU cache."""
    def __init__(self, max_size):
        self.max_size = max_size
        self.size = 0
        self.table = {}          # key → Node

        self.lru = LinkedList()

    def get(self, key):
        node = self.table.get(key)
        if not node:
            return None
        self.lru.move_to_front(node)
        return node.value

    def set(self, key, value):
        node = self.table.get(key)
        if node:
            node.value = value
            self.lru.move_to_front(node)
            return
        if self.size == self.max_size:
            evicted = self.lru.remove_from_tail()
            if evicted:
                self.table.pop(evicted.key)
                self.size -= 1
        new_node = Node(key, value)
        self.lru.append_to_front(new_node)
        self.table[key] = new_node
        self.size += 1

```

This implementation stores key-value pairs in a hash table (`self.table`) for O(1) access while maintaining a doubly-linked list to track usage order. When the cache reaches `max_size`, `remove_from_tail()` evicts the oldest entry in constant time.

## Scaling with Consistent Hashing

To handle datasets larger than single-node memory, you must shard data across multiple cache instances. The [`README.md`](https://github.com/donnemartin/system-design-primer/blob/main/README.md) section on sharding recommends **consistent hashing** to distribute keys across nodes.

Consistent hashing minimizes reorganization when nodes are added or removed. Instead of modulo-based hashing that remaps most keys, consistent hashing maps both nodes and keys to a ring, allowing only adjacent keys to move when topology changes. This prevents hot-spotting and enables horizontal scalability.

Three strategies exist for expanding the memory cache to many machines:

1. **Independent caches per node** — Simplest implementation but results in low hit-rates due to duplicated cold data across nodes.
2. **Full replication** — High hit-rate since every node holds all data, but inefficient memory usage limits total dataset size.
3. **Sharded cache** — Best trade-off where each node holds a distinct key-range, using consistent hashing to locate the correct node for any given key.

## Consistency Patterns and Replication

The [`README.md`](https://github.com/donnemartin/system-design-primer/blob/main/README.md) file describes several patterns for synchronizing the cache with persistent storage.

**Cache-aside** (or lazy loading) remains the most common pattern for read-heavy workloads. The application checks the cache first; on miss, it queries the datastore, populates the cache, and returns the result. This prevents unnecessary cache writes and handles stale data gracefully.

For write strategies, you can implement **write-through** (synchronous update to cache and store) or **write-behind** (asynchronous batch updates to the store). Most high-performance key-value stores accept **eventual consistency**, allowing updates to propagate within bounded time rather than requiring immediate synchronization across all replicas.

Replication strategies from [`README.md`](https://github.com/donnemartin/system-design-primer/blob/main/README.md) include:
- **Primary-secondary replication** — One node handles writes, replicas serve reads, automatic failover on primary failure.
- **Multi-master replication** — Multiple nodes accept writes, requiring conflict resolution but offering higher write availability.

## Production Deployment Considerations

Deploying a high-performance key-value store requires operational safeguards. According to the system design primer, implement **back-pressure** to throttle client requests when the cache or datastore approaches saturation, preventing cascading failures.

Monitoring metrics must track cache hit-rate, latency percentiles, eviction counts, and node resource utilization. Auto-scaling groups should add or remove cache nodes based on CPU and memory thresholds, while maintaining session affinity where required.

## Summary

- **Combine hash tables with doubly-linked lists** to achieve O(1) LRU cache operations as shown in [`solutions/system_design/query_cache/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/query_cache/README.md).
- **Use consistent hashing** to shard data across nodes, enabling horizontal scaling without massive key remapping.
- **Implement cache-aside pattern** for efficient read-heavy workloads, falling back to durable storage on misses.
- **Deploy write-ahead logging and replication** in the persistence layer to ensure durability and high availability.
- **Monitor hit-rates and latency** to trigger auto-scaling and maintain sub-millisecond performance under load.

## Frequently Asked Questions

### What data structure provides O(1) lookups in a high-performance key-value store?

A **hash table** provides constant-time lookups for key existence and value retrieval. When combined with a **doubly-linked list** to track access order, as implemented in the `Cache` class from [`solutions/system_design/query_cache/README.md`](https://github.com/donnemartin/system-design-primer/blob/main/solutions/system_design/query_cache/README.md), you achieve O(1) performance for both lookups and LRU evictions.

### How does consistent hashing help scale a key-value store?

**Consistent hashing** maps both cache nodes and data keys to a circular space or "ring." When adding or removing nodes, only the keys adjacent to the changed node need remapping, rather than rehashing the entire dataset. This minimizes disruption during scaling events and prevents the "thundering herd" problem during cache rebuilds.

### What is the difference between cache-aside and write-through patterns?

**Cache-aside** loads data into the cache only on read misses, keeping the cache filled with actually requested data. **Write-through** synchronously writes to both cache and persistent storage on every update, ensuring consistency but adding latency. Cache-aside suits read-heavy workloads with eventual consistency requirements, while write-through fits scenarios requiring strict consistency between cache and store.

### How do you handle cache eviction in a memory-constrained environment?

Implement an **LRU (Least Recently Used)** eviction policy using a combination of a hash table and doubly-linked list. When the cache reaches capacity, remove the tail node (oldest accessed item) in O(1) time and delete its hash table entry. This ensures frequently accessed "hot" keys remain in fast memory while rarely used "cold" data gets discarded efficiently.