# How Turso’s B‑Tree Storage Engine Manages Pages and Indexes

> Discover how Turso's B-tree storage engine manages pages and indexes with tiered headers, overflow chains, and automatic balancing for efficient data organization. Learn more today.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: internals
- Published: 2026-06-23

---

**Turso implements a SQLite‑compatible B‑tree storage engine that organizes data into fixed‑size pages with tiered header layouts, variable‑length cells, and automatic balancing via state machines, storing oversized payloads in linked overflow chains.**

Turso persists relational data using a custom B‑tree implementation that maintains binary compatibility with SQLite on‑disk formats. The engine spans two primary modules—[`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs) for high‑level navigation and [`core/storage/sqlite3_ondisk.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/sqlite3_ondisk.rs) for serialization—managing pages, indexes, and overflow chains through deterministic state machines.

## Page Structure and Header Layout

Every page begins with a fixed‑size header followed by a cell pointer array and a cell content area that grows leftwards as cells are added.

### Header Fields and Offsets

Interior pages use a 12‑byte header, while leaf pages use 8 bytes. The header fields are defined by constants parsed in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs):

- `BTREE_PAGE_TYPE` – distinguishes between table and index pages
- `BTREE_FIRST_FREEBLOCK` – offset to the first unused space block within the cell area
- `BTREE_CELL_COUNT` – total number of cells stored on the page
- `BTREE_CELL_CONTENT_AREA` – pointer to the start of the payload region
- `BTREE_FRAGMENTED_BYTES_COUNT` – isolated unusable bytes inside the cell area
- `BTREE_RIGHTMOST_PTR` – right‑most child page pointer stored separately from cells

Turso caches the usable space of each page (`usable_space_cached`) inside the cursor to avoid recalculating `page_size – reserved_space` for every operation.

### Cell Organization

The cell pointer array sits immediately after the header, containing offsets to each cell in the content area. As cells are inserted, the content area expands leftwards from the end of the page toward the pointer array.

## Cell Layout and Payload Management

Cells are the atomic units of storage, with distinct layouts for table rows and index entries defined in [`core/storage/sqlite3_ondisk.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/sqlite3_ondisk.rs).

### Table vs Index Cells

```rust
pub struct TableLeafCell {
    pub rowid: i64,
    pub payload: &'static [u8],
    pub payload_size: u64,
    pub first_overflow_page: Option<u32>,
}

pub struct IndexLeafCell {
    pub payload: &'static [u8],
    pub payload_size: u64,
    pub first_overflow_page: Option<u32>,
}

```

`TableLeafCell` stores the rowid and serialized record, while `IndexLeafCell` stores only the indexed key payload. Both variants track overflow via `first_overflow_page`.

### Overflow Handling

When a payload exceeds the local size threshold, the excess spills into a chain of overflow pages. The `process_overflow_read` function in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs) traverses this chain by reading 4‑byte big‑endian next‑page pointers until reaching a zero terminator. The `read_btree_cell` function deserializes raw page bytes into a `BTreeCell` enum, extracting the initial overflow reference when present.

## Index Organization and Navigation

Indexes are stored as separate B‑Trees with dedicated page types:

```rust
pub enum PageType {
    IndexInterior = 2,
    TableInterior = 5,
    IndexLeaf = 10,
    TableLeaf = 13,
}

```

### Interior vs Leaf Pages

**IndexInterior** pages contain a left‑child page pointer followed by the indexed key payload. **IndexLeaf** pages contain only the key payload without child pointers.

During navigation, the cursor performs binary search on the cell pointer array using `BinarySearchState` structs. When traversing interior nodes, the cursor descends to the child page indicated by the selected cell’s left‑child pointer, tracking `going_upwards` state to determine whether divider cells must be revisited.

## Balancing and Page Splits

When insertion fills a leaf beyond capacity, Turso triggers a balance operation to redistribute cells and maintain B‑tree invariants.

### Balance State Machine

The `BalanceState` struct manages reusable buffers and tracks the current `BalanceSubState`. The algorithm examines up to three sibling pages (limited by `MAX_SIBLING_PAGES_TO_BALANCE = 5`) and redistributes cells across 1‑5 pages depending on total cell count. The engine enforces a maximum tree depth of 20 levels via `BTCURSOR_MAX_DEPTH = 20` in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs).

### Pointer Maintenance

Balancing updates the `rightmost_pointer` field within `BalanceInfo` to preserve key ordering across parent‑child relationships. The operation is atomic with respect to the page stack, ensuring crash consistency.

## Cursor Architecture and Page Stack

The `BTreeCursor` struct serves as the primary interface for tree traversal and modification.

### Cursor Structure

Defined in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs), the cursor maintains:

```rust
pub struct BTreeCursor {
    pub pager: Arc<Pager>,
    usable_space_cached: usize,
    root_page: i64,
    stack: PageStack,
    balance_state: BalanceState,
    overflow_state: OverflowState,
    // ... state machines for seek, insert, delete
}

```

The `PageStack` stores `BTreeNodeState` entries for each tree level, caching the current cell index and total cell count to enable fast upward traversal without re‑reading pages.

### State Machine Navigation

The cursor uses specialized state machines for each operation:

- `CursorSeekState` – manages binary search and descent
- `WriteState::Insert` – handles cell insertion with overflow detection
- `BalanceSubState` – coordinates page redistribution during splits

The `CursorValidState` flag indicates whether the current position remains valid after balancing or MVCC writes, triggering repositioning when necessary.

## Overflow Page Chain Mechanics

For payloads exceeding local capacity, Turso allocates overflow pages forming a singly linked list. Each overflow page stores a 4‑byte big‑endian pointer to the next page at its start, with the final page containing zero. The `read_payload` routine extracts the first overflow page number from the last four bytes of the cell’s local storage, and `process_overflow_read` assembles the complete payload by walking this chain.

## Summary

- **Page headers** in Turso use 8–12 byte fixed layouts with fields for cell counts, freeblocks, and right‑most pointers, defined in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs) and [`core/storage/sqlite3_ondisk.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/sqlite3_ondisk.rs).
- **Cells** vary by type (`TableLeafCell` vs `IndexLeafCell`) and store rowids, payloads, and optional overflow page references.
- **Indexes** are separate B‑Trees with `IndexInterior` and `IndexLeaf` page types, traversed via binary search and child pointers.
- **Balancing** employs a `BalanceState` machine that redistributes cells across up to five sibling pages while updating right‑most pointers to maintain order.
- **Cursors** use a `PageStack` to cache node states, enabling efficient traversal without redundant I/O, and handle overflow chains through 4‑byte big‑endian linked lists.

## Frequently Asked Questions

### How does Turso handle large payloads that exceed page size?

Turso stores the local portion of the payload in the cell and chains the remainder across **overflow pages**. Each overflow page begins with a 4‑byte big‑endian pointer to the next page, and the cell records the first overflow page number in `first_overflow_page`. The `process_overflow_read` function in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs) traverses this chain to reconstruct the full payload when reading.

### What is the maximum depth of a B‑tree in Turso?

The engine enforces a maximum depth of **20 levels** via the `BTCURSOR_MAX_DEPTH` constant in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs). Exceeding this depth triggers a corruption error, indicating an unbalanced or malformed tree structure that violates the storage engine's integrity constraints.

### How does Turso maintain balance when inserting into a full page?

When insertion causes overflow, Turso enters the `BalanceState` machine and selects up to three sibling pages (configurable via `MAX_SIBLING_PAGES_TO_BALANCE`). It redistributes cells across 1‑5 pages, updates the parent’s `rightmost_pointer`, and adjusts the cursor’s page stack to reflect new page boundaries, all while maintaining SQLite‑compatible on‑disk formats.

### What distinguishes table pages from index pages in the storage engine?

Table pages (`TableInterior` and `TableLeaf`) store rowids and full row payloads, while index pages (`IndexInterior` and `IndexLeaf`) store indexed keys. Interior pages in both cases contain left‑child pointers, but table leaf cells include a rowid field that index leaf cells omit. These distinctions are encoded in the `PageType` enum and parsed by `read_btree_cell` in [`core/storage/sqlite3_ondisk.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/sqlite3_ondisk.rs).