# How Turso's B-Tree Storage Engine Stores and Retrieves Data

> Discover how Turso's B-tree storage engine efficiently stores and retrieves data in fixed-size pages within a single SQLite-compatible file, optimizing navigation and record access.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: internals
- Published: 2026-06-22

---

**Turso stores all tables and indexes in fixed-size pages within a single SQLite-compatible file, using interior B-tree nodes for navigation and leaf nodes for row payload, with overflow pages chained for large records.**

Turso is an edge-native database built on libSQL, the open-source fork of SQLite maintained by Turso (tursodatabase/turso). The **Turso B-tree storage engine** persists data in a single file split into fixed-size pages, implementing a traditional B-tree structure where interior nodes point to child pages and leaf nodes contain actual row data. This design maintains byte-for-byte compatibility with SQLite's on-disk format while integrating Turso-specific async I/O and MVCC support.

## Page Layout and On-Disk Structure

All B-tree pages in Turso begin with a standardized header that defines the page type and cell distribution. In [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs), the `offset` module defines the byte positions for each header field:

```rust
pub mod offset {
    pub const BTREE_PAGE_TYPE: usize = 0;          // u8
    pub const BTREE_FIRST_FREEBLOCK: usize = 1;    // u16
    pub const BTREE_CELL_COUNT: usize = 3;         // u16
    pub const BTREE_CELL_CONTENT_AREA: usize = 5; // u16
    pub const BTREE_FRAGMENTED_BYTES_COUNT: usize = 7; // u8
    pub const BTREE_RIGHTMOST_PTR: usize = 8;      // u32
}

```

The header layout follows this structure: a 1-byte page type identifier, a 2-byte offset to the first freeblock, a 2-byte cell count, a 2-byte pointer to the cell content area start, a 1-byte fragmented bytes count, and a 4-byte rightmost pointer for interior nodes.

Each page is either an **interior node**—which stores child-page pointers and optional index keys—or a **leaf node** containing the actual row payload. The usable space per page is calculated as `page_size – reserved_space`, as defined in `DatabaseHeader::usable_space` within [`sqlite3_ondisk.rs`](https://github.com/tursodatabase/turso/blob/main/sqlite3_ondisk.rs) (lines 69-71).

## Cell Structure and Payload Management

Data within pages is organized into **cells**, represented by the `BTreeCell` enum in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs) (lines 724-796):

```rust
pub enum BTreeCell {
    TableInteriorCell(TableInteriorCell),
    TableLeafCell(TableLeafCell),
    IndexInteriorCell(IndexInteriorCell),
    IndexLeafCell(IndexLeafCell),
}

```

Table cells contain a 64-bit row ID (`i64`), while index cells store the indexed key bytes. For leaf cells, payloads may be **local** (fitting entirely within the page) or **overflow** (spanning multiple pages).

The function `payload_overflows` in [`sqlite3_ondisk.rs`](https://github.com/tursodatabase/turso/blob/main/sqlite3_ondisk.rs) calculates the local storage threshold based on page type and usable page size. When a payload exceeds this limit, the engine stores the initial portion locally and writes the remainder to a chain of overflow pages. The last four bytes of the cell header store the page number of the first overflow page, creating a linked list that `process_overflow_read` (lines 1011-1148) traverses to reconstruct the complete record.

## Reading Data: The BTreeCursor Implementation

All data retrieval flows through `BTreeCursor`, which maintains a **stack** of pages representing the path from root to current leaf. The cursor implements a seek operation (lines 5565-5605) that performs the following steps:

1. **Binary search on interior pages** using `InteriorPageBinarySearchState` (lines 5530-5538) to locate the correct child pointer.
2. **Descent** via `cell_interior_read_left_child_page` to navigate to the target child page.
3. **Leaf traversal** to extract the record payload from `TableLeafCell` or `IndexLeafCell`.
4. **Overflow resolution** through `process_overflow_read` when `first_overflow_page` is present, following the chain until the full payload is assembled.

The cursor's `stack: PageStack` structure stores, for each level, the page reference, current cell index, and total cell count. This allows the engine to resume traversal after I/O spills without re-reading cached pages. The `read_btree_cell` function (lines 814-887) parses raw page buffers according to page type, decoding varint-encoded payload sizes and determining overflow status.

## Writing Data and Page Balancing

Insertion and updates utilize a `WriteState` state machine (lines 217-222) that handles payload allocation and overflow page creation. When writing:

- The engine allocates payload space or rewrites existing cells.
- If the payload exceeds local limits, it creates overflow pages and links them via the four-byte pointer field.
- When pages become over-full, the engine triggers **balancing** operations (`BalanceState`, lines 236-305) that redistribute cells across up to three sibling pages to maintain B-tree invariants.

The write path ensures that the retrieval logic remains unchanged—overflow chains are transparent to readers, with `BTreeCursor` automatically following links during `seek` operations.

## Pager Architecture and Data Flow

The storage engine relies on the `Pager` abstraction in [`core/storage/pager.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/pager.rs) for page allocation and the [`page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/page_cache.rs) module for in-memory caching. When storing data:

- `Pager::allocate_page` creates a `PageRef` containing a page ID and buffer, inserting it into the page cache.
- `BTreeCursor::insert` constructs `BTreeCell` instances and writes payloads, including overflow page generation when necessary.
- Page headers (type, cell count, rightmost pointer) are updated via `PageContent` methods using the `offset` constants.

During retrieval, `BTreeCursor::seek` performs binary search through interior nodes, descends to leaf pages, and returns `ImmutableRecord` instances (lines 2628-2630) to the query engine. The same traversal logic applies to index lookups, range scans, and deletions—always starting at the root, searching interior nodes, and reading leaf payloads with automatic overflow handling.

```rust
// Open a Turso connection
let db = turso::Database::open("my.db")?;

// Insert a row (payload may overflow)
db.execute("INSERT INTO users(name, data) VALUES (?, ?)",
           [&"alice", &large_blob])?;

// Retrieve a row – cursor walks the B-tree and stitches overflow pages
let row: (i64, Vec<u8>) = db.query_row(
    "SELECT rowid, data FROM users WHERE name = ?", [&"alice"],
)?;

```

## Summary

- Turso stores data in fixed-size pages within a single SQLite-compatible file, using interior nodes for navigation and leaf nodes for payload storage.
- The page header format is defined in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs) with specific byte offsets for type, cell count, and fragmentation metadata.
- **Cells** are represented by the `BTreeCell` enum, with leaf cells supporting local storage or overflow chains linked via four-byte page pointers.
- **BTreeCursor** handles all reads via binary search on interior pages and stack-based traversal, automatically following overflow chains through `process_overflow_read`.
- Writes use the `WriteState` machine and trigger `BalanceState` operations when redistributing cells across sibling pages.
- The `Pager` and [`page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/page_cache.rs) components manage page allocation and caching, ensuring efficient I/O for B-tree operations.

## Frequently Asked Questions

### How does Turso handle large payloads that exceed page size?

When a payload exceeds the local size limit calculated by `payload_overflows`, Turso stores the initial portion in the leaf cell and writes the remainder to a chain of **overflow pages**. The cell header stores the first overflow page number in its last four bytes. During retrieval, `process_overflow_read` follows this linked list page-by-page, concatenating fragments until the complete record is reconstructed.

### What is the difference between interior and leaf pages in Turso's B-tree?

**Interior pages** contain navigation pointers (4-byte child page numbers) and optional index keys, guiding the `BTreeCursor` toward target data via binary search. **Leaf pages** store the actual row payloads or index entries. Interior pages use the rightmost pointer field in the header for the final child pointer, while leaf pages contain the `BTreeCell` variants holding row IDs and user data.

### How does the BTreeCursor maintain its position during traversal?

The cursor maintains a `PageStack` that records the path from root to current leaf, storing each level's page reference, current cell index, and total cell count. This stack-based approach allows the engine to resume traversal after asynchronous I/O operations without re-reading pages already present in the [`page_cache.rs`](https://github.com/tursodatabase/turso/blob/main/page_cache.rs) cache.

### Is Turso's B-tree implementation compatible with standard SQLite files?

Yes, Turso maintains byte-for-byte compatibility with SQLite's on-disk format. The page header definitions in [`core/storage/btree.rs`](https://github.com/tursodatabase/turso/blob/main/core/storage/btree.rs) and format logic in [`sqlite3_ondisk.rs`](https://github.com/tursodatabase/turso/blob/main/sqlite3_ondisk.rs) follow SQLite specifications, allowing Turso to read standard SQLite files while adding asynchronous I/O and MVCC capabilities through the `Pager` abstraction.