# How the Turso Parser Transforms SQL Statements into VDBE Bytecode: A Complete Technical Guide

> Discover how the Turso parser transforms SQL statements into VDBE bytecode. Learn about the three-stage pipeline: lexing, parsing into an AST, execution planning, and VDBE instruction emission.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: deep-dive
- Published: 2026-06-23

---

**The Turso parser transforms SQL statements into VDBE bytecode through a three-stage pipeline that first lexes and parses source text into an AST, then plans the execution strategy, and finally emits low-level VDBE instructions via the `emit_program` function.**

The Turso database engine processes queries through a sophisticated compilation pipeline that converts human-readable SQL into executable virtual machine code. Understanding how the Turso parser transforms SQL statements into VDBE bytecode reveals the architectural decisions in `tursodatabase/turso` that maintain SQLite compatibility while enabling modern extensions like MVCC. This walkthrough traces the exact path from raw SQL strings to running bytecode using the actual source files.

## The Three-Stage Compilation Pipeline

Turso follows the same architectural pattern as SQLite, separating concerns into distinct phases:

1. **Lexing and Parsing** – Source SQL becomes an abstract syntax tree (AST)
2. **Planning** – The AST is analyzed and rewritten into a logical **Plan** describing operations (scans, joins, aggregates)
3. **Emission** – The Plan is walked to generate **VDBE bytecode** instructions executed by the virtual machine

This separation ensures that the parser remains focused on syntax validation while the optimizer and emitter handle execution details.

## Stage 1: Lexing and Parsing the SQL Text

The entry point for SQL compilation lives in [`sqlite/parser/src/parser.rs`](https://github.com/tursodatabase/turso/blob/main/sqlite/parser/src/parser.rs), which implements a recursive-descent parser consuming tokens from the lexer.

**Initializing the Parser:**
The `Parser::new` function creates a lexer and initializes parser state, accepting a byte slice of SQL source text.

**Dispatching Statements:**
`Parser::next_cmd` consumes leading semicolons and determines which statement type to parse (`SELECT`, `INSERT`, `CREATE`, etc.), returning a `Cmd` enum. This function delegates to `Parser::parse_stmt`, which dispatches to concrete statement parsers like `parse_select`, `parse_insert`, or `parse_create_stmt`.

For `SELECT` statements, `parse_select` (lines 150-225 in [`parser.rs`](https://github.com/tursodatabase/turso/blob/main/parser.rs)) builds a `Select` AST node, handling `FROM`, `WHERE`, `GROUP BY`, and window clauses. The parser uses a large `TokenType` enum and implements "fallback-ID" logic—treating keywords like `OVER` as identifiers unless the surrounding syntax requires them.

The resulting AST lives in the `turso_parser::ast` namespace and contains richly-typed nodes representing the query structure, but **no bytecode** exists at this stage.

## Stage 2: Planning—From AST to Logical Plan

Once parsing completes, the AST enters the planning phase in [`core/translate/planner.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/planner.rs). This stage decides *what* operations to perform without specifying *how* to execute them on the VDBE.

**Name Resolution and Collection:**
The planner first uses a `Resolver` to map identifiers to tables, views, CTEs, and virtual tables. Functions like `collect_from_clause_table_refs` (lines 66-115) walk the `Select` AST to discover which tables appear in the `FROM` clause.

**Building the Plan:**
`prepare_select_plan` (lines 165-210) analyzes the AST, determines join order, and constructs the `Plan` enum defined in [`core/translate/plan.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/plan.rs). This returns variants like `Plan::Select`, `Plan::CompoundSelect`, or `Plan::Update` that describe logical operations.

**Optimization Passes:**
Before emission, the optimizer modules in `core/translate/optimizer/` rewrite the plan. This includes predicate push-down, join flattening, and choosing between hash-joins versus nested-loop strategies.

**CTE Handling:**
When common table expressions are referenced, `plan_cte` (lines 334-389) creates fresh sub-plans with unique internal IDs, ensuring each CTE reference receives its own cursor space during execution.

## Stage 3: Emitting VDBE Bytecode

The final transformation occurs in [`core/translate/emitter/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/emitter/mod.rs), where the logical Plan becomes concrete VDBE instructions.

**The Entry Point:**
`emit_program` receives a `Plan`, a `ProgramBuilder` (the bytecode container), and a `Resolver`. It pattern-matches on the plan variant and forwards to specialized emitters:

```rust
pub fn emit_program(..., plan: Plan, ...) -> Result<()> {
    match plan {
        Plan::Select(p) => emit_program_for_select(...),
        Plan::Delete(p) => emit_program_for_delete(...),
        Plan::Update(p) => emit_program_for_update(...),
        Plan::CompoundSelect { .. } => emit_program_for_compound_select(...),
    }
}

```

**Select Statement Emission:**
`emit_program_for_select` in [`core/translate/emitter/select.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/emitter/select.rs) walks the logical plan and generates specific VDBE instructions:

- **Cursor Allocation:** Opens table cursors via `ProgramBuilder::emit_insn(Insn::OpenRead { … })`
- **Loop Generation:** Creates execution loops using `LoopLabels::new(program)` paired with `OP_Next` and `OP_Goto` for iteration
- **Column Access:** Emits `OP_Column` to extract values from the current row
- **Filtering:** Generates `OP_Filter` (or equivalent comparison + jump instructions) to implement `WHERE` clauses
- **Aggregation:** Allocates registers, emits `OP_AggStep` for accumulation, and `OP_AggFinal` for final results
- **Result Output:** Concludes with `OP_ResultRow` to return data to the caller

**ProgramBuilder and Instruction Encoding:**
`ProgramBuilder` (defined in [`core/vdbe/builder.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/builder.rs)) manages register allocation, cursor IDs, and label resolution. It wraps a `Vec<Insn>` containing the final bytecode consumed by the VDBE interpreter in [`core/vdbe/execute.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/execute.rs).

## Complete Example: From SQL to Bytecode

This Rust example demonstrates the full pipeline using Turso's internal APIs:

```rust
use turso_parser::Parser;
use turso::translate::{emit_program, Resolver, ProgramBuilder};
use std::sync::Arc;
use turso::Connection;

// 1. Parse the SQL into an AST
let sql = b"SELECT name, age FROM users WHERE age > 30";
let mut parser = Parser::new(sql);
let stmt = parser.next().unwrap().unwrap();   // → ast::Stmt

// 2. Initialize resolver and connection (schema setup omitted)

// 3. Build the logical plan from the AST
let plan = turso::translate::planner::prepare_select_plan(
    stmt, &resolver, ...
)?;

// 4. Emit VDBE bytecode
let mut program = ProgramBuilder::new();
emit_program(&connection, &resolver, &mut program, plan, |_| ())?;

// 5. Execute the bytecode via the VDBE interpreter
let results = connection.execute(&program)?;

```

The critical transformation occurs at `emit_program`, which converts the high-level `Plan` into register-based VDBE instructions ready for execution.

## Summary

- **Turso's parser** ([`sqlite/parser/src/parser.rs`](https://github.com/tursodatabase/turso/blob/main/sqlite/parser/src/parser.rs)) performs lexical analysis and recursive-descent parsing to produce a type-safe AST, never emitting bytecode directly.
- **The planner** ([`core/translate/planner.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/planner.rs)) transforms AST nodes into logical `Plan` structures, resolving names, optimizing join orders, and handling CTEs.
- **The emitter** ([`core/translate/emitter/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/emitter/mod.rs)) matches on `Plan` variants to generate architecture-specific VDBE instructions like `OP_Column`, `OP_AggStep`, and `OP_ResultRow`.
- **ProgramBuilder** ([`core/vdbe/builder.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/builder.rs)) abstracts register and cursor management during the emission phase.
- This three-stage separation mirrors SQLite's architecture, ensuring compatibility while allowing Turso-specific extensions in the planning and execution layers.

## Frequently Asked Questions

### Does the Turso parser generate VDBE bytecode directly from SQL text?

No. The parser exclusively produces an abstract syntax tree (AST) defined in [`sqlite/parser/src/ast.rs`](https://github.com/tursodatabase/turso/blob/main/sqlite/parser/src/ast.rs). Bytecode generation happens only after the planning phase, when the `emit_program` function in [`core/translate/emitter/mod.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/emitter/mod.rs) translates the logical `Plan` into VDBE instructions. This separation allows the parser to focus on syntax validation while the optimizer improves execution strategy before any bytecode is created.

### What is the role of the ProgramBuilder in bytecode emission?

`ProgramBuilder` (located in [`core/vdbe/builder.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/builder.rs)) serves as a construction helper for VDBE programs. It manages register allocation, assigns cursor IDs, resolves jump labels, and maintains the `Vec<Insn>` that becomes the final bytecode. The emitter modules call methods on `ProgramBuilder` to append instructions like `OP_OpenRead` or `OP_Next` without manually tracking register numbers.

### How does Turso handle complex statements like JOINs during bytecode generation?

The planner analyzes JOINs in [`core/translate/planner.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/planner.rs), determining optimal join order and strategies (hash vs. nested-loop). During emission, `emit_program_for_select` in [`core/translate/emitter/select.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/emitter/select.rs) opens separate cursors for each table in the join, generates nested loop constructs using `LoopLabels`, and emits comparison instructions to evaluate join conditions. The VDBE executes these as coordinated cursor movements through multiple table sources.

### Where does query optimization occur in the Turso compilation pipeline?

Optimization occurs during the planning stage, after AST construction but before bytecode emission. The `core/translate/optimizer/` directory contains passes that rewrite the logical `Plan`—for example, pushing predicates down to reduce scanned rows, flattening subqueries into joins, and selecting appropriate join algorithms. These transformations ensure the emitter produces efficient bytecode sequences without redundant operations.