How Turso’s B‑Tree Storage Engine Manages Pages and Indexes

Turso implements a SQLite‑compatible B‑tree storage engine that organizes data into fixed‑size pages with tiered header layouts, variable‑length cells, and automatic balancing via state machines, storing oversized payloads in linked overflow chains.

Turso persists relational data using a custom B‑tree implementation that maintains binary compatibility with SQLite on‑disk formats. The engine spans two primary modules—core/storage/btree.rs for high‑level navigation and core/storage/sqlite3_ondisk.rs for serialization—managing pages, indexes, and overflow chains through deterministic state machines.

Page Structure and Header Layout

Every page begins with a fixed‑size header followed by a cell pointer array and a cell content area that grows leftwards as cells are added.

Header Fields and Offsets

Interior pages use a 12‑byte header, while leaf pages use 8 bytes. The header fields are defined by constants parsed in core/storage/btree.rs:

  • BTREE_PAGE_TYPE – distinguishes between table and index pages
  • BTREE_FIRST_FREEBLOCK – offset to the first unused space block within the cell area
  • BTREE_CELL_COUNT – total number of cells stored on the page
  • BTREE_CELL_CONTENT_AREA – pointer to the start of the payload region
  • BTREE_FRAGMENTED_BYTES_COUNT – isolated unusable bytes inside the cell area
  • BTREE_RIGHTMOST_PTR – right‑most child page pointer stored separately from cells

Turso caches the usable space of each page (usable_space_cached) inside the cursor to avoid recalculating page_size – reserved_space for every operation.

Cell Organization

The cell pointer array sits immediately after the header, containing offsets to each cell in the content area. As cells are inserted, the content area expands leftwards from the end of the page toward the pointer array.

Cell Layout and Payload Management

Cells are the atomic units of storage, with distinct layouts for table rows and index entries defined in core/storage/sqlite3_ondisk.rs.

Table vs Index Cells

pub struct TableLeafCell {
    pub rowid: i64,
    pub payload: &'static [u8],
    pub payload_size: u64,
    pub first_overflow_page: Option<u32>,
}

pub struct IndexLeafCell {
    pub payload: &'static [u8],
    pub payload_size: u64,
    pub first_overflow_page: Option<u32>,
}

TableLeafCell stores the rowid and serialized record, while IndexLeafCell stores only the indexed key payload. Both variants track overflow via first_overflow_page.

Overflow Handling

When a payload exceeds the local size threshold, the excess spills into a chain of overflow pages. The process_overflow_read function in core/storage/btree.rs traverses this chain by reading 4‑byte big‑endian next‑page pointers until reaching a zero terminator. The read_btree_cell function deserializes raw page bytes into a BTreeCell enum, extracting the initial overflow reference when present.

Index Organization and Navigation

Indexes are stored as separate B‑Trees with dedicated page types:

pub enum PageType {
    IndexInterior = 2,
    TableInterior = 5,
    IndexLeaf = 10,
    TableLeaf = 13,
}

Interior vs Leaf Pages

IndexInterior pages contain a left‑child page pointer followed by the indexed key payload. IndexLeaf pages contain only the key payload without child pointers.

During navigation, the cursor performs binary search on the cell pointer array using BinarySearchState structs. When traversing interior nodes, the cursor descends to the child page indicated by the selected cell’s left‑child pointer, tracking going_upwards state to determine whether divider cells must be revisited.

Balancing and Page Splits

When insertion fills a leaf beyond capacity, Turso triggers a balance operation to redistribute cells and maintain B‑tree invariants.

Balance State Machine

The BalanceState struct manages reusable buffers and tracks the current BalanceSubState. The algorithm examines up to three sibling pages (limited by MAX_SIBLING_PAGES_TO_BALANCE = 5) and redistributes cells across 1‑5 pages depending on total cell count. The engine enforces a maximum tree depth of 20 levels via BTCURSOR_MAX_DEPTH = 20 in core/storage/btree.rs.

Pointer Maintenance

Balancing updates the rightmost_pointer field within BalanceInfo to preserve key ordering across parent‑child relationships. The operation is atomic with respect to the page stack, ensuring crash consistency.

Cursor Architecture and Page Stack

The BTreeCursor struct serves as the primary interface for tree traversal and modification.

Cursor Structure

Defined in core/storage/btree.rs, the cursor maintains:

pub struct BTreeCursor {
    pub pager: Arc<Pager>,
    usable_space_cached: usize,
    root_page: i64,
    stack: PageStack,
    balance_state: BalanceState,
    overflow_state: OverflowState,
    // ... state machines for seek, insert, delete
}

The PageStack stores BTreeNodeState entries for each tree level, caching the current cell index and total cell count to enable fast upward traversal without re‑reading pages.

State Machine Navigation

The cursor uses specialized state machines for each operation:

  • CursorSeekState – manages binary search and descent
  • WriteState::Insert – handles cell insertion with overflow detection
  • BalanceSubState – coordinates page redistribution during splits

The CursorValidState flag indicates whether the current position remains valid after balancing or MVCC writes, triggering repositioning when necessary.

Overflow Page Chain Mechanics

For payloads exceeding local capacity, Turso allocates overflow pages forming a singly linked list. Each overflow page stores a 4‑byte big‑endian pointer to the next page at its start, with the final page containing zero. The read_payload routine extracts the first overflow page number from the last four bytes of the cell’s local storage, and process_overflow_read assembles the complete payload by walking this chain.

Summary

  • Page headers in Turso use 8–12 byte fixed layouts with fields for cell counts, freeblocks, and right‑most pointers, defined in core/storage/btree.rs and core/storage/sqlite3_ondisk.rs.
  • Cells vary by type (TableLeafCell vs IndexLeafCell) and store rowids, payloads, and optional overflow page references.
  • Indexes are separate B‑Trees with IndexInterior and IndexLeaf page types, traversed via binary search and child pointers.
  • Balancing employs a BalanceState machine that redistributes cells across up to five sibling pages while updating right‑most pointers to maintain order.
  • Cursors use a PageStack to cache node states, enabling efficient traversal without redundant I/O, and handle overflow chains through 4‑byte big‑endian linked lists.

Frequently Asked Questions

How does Turso handle large payloads that exceed page size?

Turso stores the local portion of the payload in the cell and chains the remainder across overflow pages. Each overflow page begins with a 4‑byte big‑endian pointer to the next page, and the cell records the first overflow page number in first_overflow_page. The process_overflow_read function in core/storage/btree.rs traverses this chain to reconstruct the full payload when reading.

What is the maximum depth of a B‑tree in Turso?

The engine enforces a maximum depth of 20 levels via the BTCURSOR_MAX_DEPTH constant in core/storage/btree.rs. Exceeding this depth triggers a corruption error, indicating an unbalanced or malformed tree structure that violates the storage engine's integrity constraints.

How does Turso maintain balance when inserting into a full page?

When insertion causes overflow, Turso enters the BalanceState machine and selects up to three sibling pages (configurable via MAX_SIBLING_PAGES_TO_BALANCE). It redistributes cells across 1‑5 pages, updates the parent’s rightmost_pointer, and adjusts the cursor’s page stack to reflect new page boundaries, all while maintaining SQLite‑compatible on‑disk formats.

What distinguishes table pages from index pages in the storage engine?

Table pages (TableInterior and TableLeaf) store rowids and full row payloads, while index pages (IndexInterior and IndexLeaf) store indexed keys. Interior pages in both cases contain left‑child pointers, but table leaf cells include a rowid field that index leaf cells omit. These distinctions are encoded in the PageType enum and parsed by read_btree_cell in core/storage/sqlite3_ondisk.rs.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →