# How Turso’s Page Cache and Pager System Optimize Memory Usage

> Discover how Turso's PageCache and Pager system optimize memory usage with SIEVE eviction, WAL spills, and buffer pooling for a small, predictable working set.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: performance
- Published: 2026-06-23

---

**Turso optimizes memory usage through a bounded PageCache implementing the SIEVE eviction algorithm, coupled with a Pager that coordinates dirty-page tracking, asynchronous WAL spills, and buffer pooling to keep the working set small and predictable.**

Turso, the open-source edge database maintained at `tursodatabase/turso`, manages SQLite-compatible database pages through a specialized two-layer architecture. The **PageCache** serves as a fixed-capacity, self-evicting in-memory store, while the **Pager** acts as the coordination layer between the cache, disk I/O, and the Write-Ahead Log (WAL). This design ensures that only frequently accessed data occupies RAM, while stale pages are evicted and dirty data is spilled to persistent storage before memory pressure becomes critical.

## The PageCache: Bounded Storage with SIEVE Eviction

The `PageCache` in [`core/storage/page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/page_cache.rs) provides a bounded, thread-safe storage layer that automatically manages its own size. Unlike unbounded caches that risk uncontrolled memory growth, Turso’s implementation uses the **SIEVE** (clock-hand) eviction algorithm to maintain a strict page-count limit while maximizing hit rates for hot data.

### Fixed Capacity and Configuration

The cache initializes with a hard page limit defined by `DEFAULT_PAGE_CACHE_SIZE_IN_PAGES`. In [`core/storage/page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/page_cache.rs) (lines 14-18), the default constructor creates a cache with this fixed capacity:

```rust
pub fn default() -> Self { 
    PageCache::new(DEFAULT_PAGE_CACHE_SIZE_IN_PAGES) 
}

```

For WebAssembly builds, this defaults to a smaller value appropriate for browser environments, while native builds use a larger default (typically a few thousand pages, or roughly 8 MiB assuming 4 KiB pages). This bounded approach ensures the database never consumes more memory than explicitly configured, regardless of workload size.

### SIEVE Clock-Hand Eviction Algorithm

Turso implements the SIEVE algorithm rather than traditional LRU to achieve better concurrency characteristics and simpler eviction logic. Each cache entry maintains a 3-bit `ref_bit` used by the clock hand.

The eviction logic resides in `evict_one` (lines 1028-1060 in [`core/storage/page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/page_cache.rs)). The algorithm maintains a circular intrusive linked list (`intrusive_collections::LinkedList`) where a "clock hand" sweeps continuously. When the hand visits an entry, it decrements the `ref_bit`. Only entries whose `ref_bit` equals `CLEAR` and are marked evictable are removed. On insertion, new entries are placed *after* the clock hand, making them Most-Recently-Used (MRU), which protects them from immediate eviction.

### Evictable Page Tracking

To avoid scanning the entire map when checking if eviction is possible, the cache maintains an `evictable_count` counter. This lightweight atomic counter updates during critical state transitions:

- **Insertion**: Incremented in `_insert` (lines 274-277)
- **Deletion**: Decrement in `_delete` (lines 330-334)  
- **Dirty transitions**: Modified in `notify_page_dirty` and `notify_page_spilled` (lines 510-525)

When `evictable_count` reaches zero, the cache knows it cannot free space without spilling dirty pages first, triggering coordination with the Pager.

### Spill Threshold Management

The cache supports a configurable spill threshold (`DEFAULT_SPILL_THRESHOLD_PERCENT`). When the ratio of dirty pages to total capacity exceeds this percentage, `needs_spill()` (lines 2124-2145) returns true, signaling the Pager to flush data. The method `collect_spillable_pages` (lines 4445-4470) then gathers dirty pages for asynchronous writing to the WAL, allowing the cache to reclaim memory without blocking foreground operations.

## The Pager: Orchestrating Cache, I/O, and Persistence

The `Pager` in [`core/storage/pager.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/pager.rs) serves as the primary interface between the database engine and storage. It manages the lifecycle of `Page` objects, coordinates disk I/O, and ensures that dirty data eventually reaches the WAL while optimizing for concurrent access.

### Non-Blocking Page Retrieval and Read Deduplication

When the engine requests a page, `Pager::read_page_nonblock` first queries the `PageCache`. On a cache miss, rather than immediately blocking, the pager checks `pending_reads` (line 636), a `RwLock`-protected map that tracks in-flight asynchronous read operations.

As noted in the source comments (lines 635-640), this mechanism allows multiple statements requesting the same uncached page to share a single I/O operation. The first caller initiates the read; subsequent callers await the same future, eliminating redundant disk seeks under concurrent load.

### Dirty-Page Tracking via Bitmap

Durability requires precise tracking of modified pages. When application code calls `Page::set_dirty` (lines 998-1003 in [`pager.rs`](https://github.com/tursodatabase/turso/blob/main/pager.rs)), the pager increments a `RoaringBitmap` named `dirty_pages` (line 1444). This compressed bitmap stores page numbers in sorted order, matching SQLite’s WAL expectations and enabling efficient sequential writes during commit operations.

### Asynchronous Spill Coordination

The Pager owns a `SpillState` (line 1269) that runs an asynchronous state machine for cache flushing. When `PageCache::needs_spill` returns true, the pager schedules a spill operation:

1. Collect candidate pages via `collect_spillable_pages`
2. Write them to the WAL asynchronously  
3. Mark each spilled page via `Page::set_spilled`, transitioning them to evictable status

This pipeline ensures that dirty data is persisted while freeing cache capacity for new pages, all without blocking SQL execution threads.

### Buffer Pool Reuse

To minimize heap allocations during I/O operations, the Pager maintains a `BufferPool` (line 632 in [`pager.rs`](https://github.com/tursodatabase/turso/blob/main/pager.rs)). Temporary buffers for reading large B-tree cells or other transient data are sourced from this pool rather than allocated per-operation. The pool reuses fixed-size buffers, keeping garbage collection pressure low during high-throughput workloads.

### Page Pinning for Cursor Stability

Cursor operations require stable page references even under cache pressure. The Pager implements **pinning** to prevent eviction of actively used pages. When a cursor accesses a page, it calls `Page::pin` (lines 666-669), incrementing a reference counter. The `evictable` function (line 2020) checks this counter before allowing eviction. Only when `unpin` is called does the page become eligible for the SIEVE algorithm again, guaranteeing that long-running scans do not encounter unexpected page faults.

## The Complete Memory Optimization Flow

The interaction between PageCache and Pager follows a strict lifecycle to maintain memory bounds:

1. **Access**: The Pager requests a page via `PageCache::get`. On hit, the cache bumps the reference bit (`entry.bump_ref` on line 889) and returns a `PageRef`. On miss, the Pager initiates an async read tracked in `pending_reads`.

2. **Eviction**: When insertion requires space, `make_room_for` triggers `evict_one`. The SIEVE clock hand sweeps, giving second chances to recently-accessed pages before evicting clean or spilled entries.

3. **Dirty Page Handling**: Modified pages call `set_dirty`, which clears the spilled flag and notifies the cache via `notify_page_dirty`, decrementing `evictable_count`. These pages cannot be evicted until persisted.

4. **Spilling**: When dirty pages exceed the spill threshold, the Pager’s `CacheFlushState` writes them to the WAL, then marks them spilled via `set_spilled`. This increments `evictable_count`, making them eligible for eviction.

5. **Checkpoint**: After sufficient WAL data accumulates, the Pager triggers a checkpoint to move data back to the main database file, freeing WAL space and allowing cache entries to be reclaimed or re-used.

## Practical Implementation Examples

### Creating a Pager with a Custom-Sized Cache

Configure the cache size during Pager initialization to control memory footprints for specific workloads:

```rust
use std::sync::Arc;
use turso::storage::pager::Pager;
use turso::storage::buffer_pool::BufferPool;

// 1 MiB cache → 256 pages (4 KiB each)
let cache = Arc::new(turso::storage::page_cache::PageCache::new(256));
let pager = Pager {
    db_file: Arc::new(db_storage),
    wal: Some(Arc::new(wal)),
    page_cache: Arc::new(turso::sync::RwLock::new(cache)),
    buffer_pool: Arc::new(BufferPool::new()),
    // ... remaining fields
};

```

### Handling Cache Hits and Misses

Detect cache residency to optimize application logic:

```rust
let key = turso::storage::page_cache::PageCacheKey::new(42);
let mut cache_guard = pager.page_cache.write();

match cache_guard.get(&key) {
    Ok(Some(page)) => {
        // Reference bit bumped automatically in get()
        process_cached_page(page);
    }
    _ => {
        // Schedules async read, deduplicates concurrent requests
        let page = pager.read_page_nonblock(42).await?;
        process_fresh_page(page);
    }
}

```

### Managing Dirty Pages and Spills

Explicitly mark modifications to trigger the spill pipeline:

```rust
let page = cache_guard.get(&key)?.unwrap();
page.set_dirty();                      // Marks PAGE_DIRTY
pager.notify_page_dirty(key);          // Updates evictable_count

// When threshold exceeded, pager automatically:
// 1. Calls cache.collect_spillable_pages(max_pages)
// 2. Writes to WAL
// 3. Calls page.set_spilled() to mark evictable

```

### Pinning Pages During Iteration

Prevent eviction during cursor operations:

```rust
let cursor = pager.open_cursor(root_page)?;
let page = cursor.current_page()?;
page.pin();                            // Increment pin counter

// Safe to use page across await points or long computations
traverse_tree(&page)?;
page.unpin();                          // Now eligible for eviction

```

## Summary

- **Bounded Cache**: The `PageCache` enforces strict memory limits via `DEFAULT_PAGE_CACHE_SIZE_IN_PAGES`, preventing unbounded growth.
- **SIEVE Eviction**: Uses a clock-hand algorithm with 3-bit reference counters for efficient O(1) eviction decisions.
- **Async Spilling**: The Pager coordinates dirty-page writes to the WAL through `SpillState`, converting dirty pages to evictable status without blocking queries.
- **Read Deduplication**: `pending_reads` eliminates duplicate I/O for concurrent page requests.
- **Resource Reuse**: `BufferPool` and pinning mechanisms minimize allocations and ensure cursor stability under memory pressure.

## Frequently Asked Questions

### How does Turso’s page cache differ from a standard LRU implementation?

Turso uses the **SIEVE** algorithm rather than traditional LRU. According to the source code in [`core/storage/page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/page_cache.rs) (lines 1028-1060), SIEVE maintains a circular clock hand that sweeps entries, decrementing 3-bit reference counters. This approach requires less metadata per entry than LRU timestamps and handles concurrent access patterns more efficiently, while the intrusive linked list design minimizes pointer chasing during eviction.

### What happens when the page cache reaches its configured capacity?

When full, the cache invokes `make_room_for`, which triggers `evict_one`. The SIEVE algorithm scans the clock hand until finding an entry with a cleared reference bit that is not dirty and not pinned. If `evictable_count` is zero (meaning all pages are dirty), the Pager’s `needs_spill` logic initiates an asynchronous flush to the WAL, converting dirty pages to spilled status so they can be evicted.

### How does the Pager prevent duplicate disk I/O when multiple queries request the same uncached page?

The Pager maintains a `RwLock`-protected `FxHashMap` called `pending_reads` (lines 635-640 in [`pager.rs`](https://github.com/tursodatabase/turso/blob/main/pager.rs)). When `read_page_nonblock` encounters a cache miss, it checks this map. If an async read is already in-flight for that page number, the function returns the existing future rather than issuing a new I/O request. This ensures that under concurrent load, the disk sees only one physical read per page.

### Why must dirty pages be spilled to the WAL before they can be evicted from the cache?

Durability guarantees require that committed data survive process crashes. The `PageCache` cannot evict dirty pages (those modified since the last flush) because they contain uncommitted changes. The Pager must first write them to the WAL via the asynchronous spill pipeline, then call `set_spilled` to mark them clean. This transition updates the `evictable_count` (lines 510-525 in [`page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/page_cache.rs)), allowing the SIEVE algorithm to reclaim that memory safely.