How the Turso Parser Transforms SQL Statements into VDBE Bytecode: A Complete Technical Guide

The Turso parser transforms SQL statements into VDBE bytecode through a three-stage pipeline that first lexes and parses source text into an AST, then plans the execution strategy, and finally emits low-level VDBE instructions via the emit_program function.

The Turso database engine processes queries through a sophisticated compilation pipeline that converts human-readable SQL into executable virtual machine code. Understanding how the Turso parser transforms SQL statements into VDBE bytecode reveals the architectural decisions in tursodatabase/turso that maintain SQLite compatibility while enabling modern extensions like MVCC. This walkthrough traces the exact path from raw SQL strings to running bytecode using the actual source files.

The Three-Stage Compilation Pipeline

Turso follows the same architectural pattern as SQLite, separating concerns into distinct phases:

  1. Lexing and Parsing – Source SQL becomes an abstract syntax tree (AST)
  2. Planning – The AST is analyzed and rewritten into a logical Plan describing operations (scans, joins, aggregates)
  3. Emission – The Plan is walked to generate VDBE bytecode instructions executed by the virtual machine

This separation ensures that the parser remains focused on syntax validation while the optimizer and emitter handle execution details.

Stage 1: Lexing and Parsing the SQL Text

The entry point for SQL compilation lives in sqlite/parser/src/parser.rs, which implements a recursive-descent parser consuming tokens from the lexer.

Initializing the Parser: The Parser::new function creates a lexer and initializes parser state, accepting a byte slice of SQL source text.

Dispatching Statements: Parser::next_cmd consumes leading semicolons and determines which statement type to parse (SELECT, INSERT, CREATE, etc.), returning a Cmd enum. This function delegates to Parser::parse_stmt, which dispatches to concrete statement parsers like parse_select, parse_insert, or parse_create_stmt.

For SELECT statements, parse_select (lines 150-225 in parser.rs) builds a Select AST node, handling FROM, WHERE, GROUP BY, and window clauses. The parser uses a large TokenType enum and implements "fallback-ID" logic—treating keywords like OVER as identifiers unless the surrounding syntax requires them.

The resulting AST lives in the turso_parser::ast namespace and contains richly-typed nodes representing the query structure, but no bytecode exists at this stage.

Stage 2: Planning—From AST to Logical Plan

Once parsing completes, the AST enters the planning phase in core/translate/planner.rs. This stage decides what operations to perform without specifying how to execute them on the VDBE.

Name Resolution and Collection: The planner first uses a Resolver to map identifiers to tables, views, CTEs, and virtual tables. Functions like collect_from_clause_table_refs (lines 66-115) walk the Select AST to discover which tables appear in the FROM clause.

Building the Plan: prepare_select_plan (lines 165-210) analyzes the AST, determines join order, and constructs the Plan enum defined in core/translate/plan.rs. This returns variants like Plan::Select, Plan::CompoundSelect, or Plan::Update that describe logical operations.

Optimization Passes: Before emission, the optimizer modules in core/translate/optimizer/ rewrite the plan. This includes predicate push-down, join flattening, and choosing between hash-joins versus nested-loop strategies.

CTE Handling: When common table expressions are referenced, plan_cte (lines 334-389) creates fresh sub-plans with unique internal IDs, ensuring each CTE reference receives its own cursor space during execution.

Stage 3: Emitting VDBE Bytecode

The final transformation occurs in core/translate/emitter/mod.rs, where the logical Plan becomes concrete VDBE instructions.

The Entry Point: emit_program receives a Plan, a ProgramBuilder (the bytecode container), and a Resolver. It pattern-matches on the plan variant and forwards to specialized emitters:

pub fn emit_program(..., plan: Plan, ...) -> Result<()> {
    match plan {
        Plan::Select(p) => emit_program_for_select(...),
        Plan::Delete(p) => emit_program_for_delete(...),
        Plan::Update(p) => emit_program_for_update(...),
        Plan::CompoundSelect { .. } => emit_program_for_compound_select(...),
    }
}

Select Statement Emission: emit_program_for_select in core/translate/emitter/select.rs walks the logical plan and generates specific VDBE instructions:

  • Cursor Allocation: Opens table cursors via ProgramBuilder::emit_insn(Insn::OpenRead { … })
  • Loop Generation: Creates execution loops using LoopLabels::new(program) paired with OP_Next and OP_Goto for iteration
  • Column Access: Emits OP_Column to extract values from the current row
  • Filtering: Generates OP_Filter (or equivalent comparison + jump instructions) to implement WHERE clauses
  • Aggregation: Allocates registers, emits OP_AggStep for accumulation, and OP_AggFinal for final results
  • Result Output: Concludes with OP_ResultRow to return data to the caller

ProgramBuilder and Instruction Encoding: ProgramBuilder (defined in core/vdbe/builder.rs) manages register allocation, cursor IDs, and label resolution. It wraps a Vec<Insn> containing the final bytecode consumed by the VDBE interpreter in core/vdbe/execute.rs.

Complete Example: From SQL to Bytecode

This Rust example demonstrates the full pipeline using Turso's internal APIs:

use turso_parser::Parser;
use turso::translate::{emit_program, Resolver, ProgramBuilder};
use std::sync::Arc;
use turso::Connection;

// 1. Parse the SQL into an AST
let sql = b"SELECT name, age FROM users WHERE age > 30";
let mut parser = Parser::new(sql);
let stmt = parser.next().unwrap().unwrap();   // → ast::Stmt

// 2. Initialize resolver and connection (schema setup omitted)

// 3. Build the logical plan from the AST
let plan = turso::translate::planner::prepare_select_plan(
    stmt, &resolver, ...
)?;

// 4. Emit VDBE bytecode
let mut program = ProgramBuilder::new();
emit_program(&connection, &resolver, &mut program, plan, |_| ())?;

// 5. Execute the bytecode via the VDBE interpreter
let results = connection.execute(&program)?;

The critical transformation occurs at emit_program, which converts the high-level Plan into register-based VDBE instructions ready for execution.

Summary

  • Turso's parser (sqlite/parser/src/parser.rs) performs lexical analysis and recursive-descent parsing to produce a type-safe AST, never emitting bytecode directly.
  • The planner (core/translate/planner.rs) transforms AST nodes into logical Plan structures, resolving names, optimizing join orders, and handling CTEs.
  • The emitter (core/translate/emitter/mod.rs) matches on Plan variants to generate architecture-specific VDBE instructions like OP_Column, OP_AggStep, and OP_ResultRow.
  • ProgramBuilder (core/vdbe/builder.rs) abstracts register and cursor management during the emission phase.
  • This three-stage separation mirrors SQLite's architecture, ensuring compatibility while allowing Turso-specific extensions in the planning and execution layers.

Frequently Asked Questions

Does the Turso parser generate VDBE bytecode directly from SQL text?

No. The parser exclusively produces an abstract syntax tree (AST) defined in sqlite/parser/src/ast.rs. Bytecode generation happens only after the planning phase, when the emit_program function in core/translate/emitter/mod.rs translates the logical Plan into VDBE instructions. This separation allows the parser to focus on syntax validation while the optimizer improves execution strategy before any bytecode is created.

What is the role of the ProgramBuilder in bytecode emission?

ProgramBuilder (located in core/vdbe/builder.rs) serves as a construction helper for VDBE programs. It manages register allocation, assigns cursor IDs, resolves jump labels, and maintains the Vec<Insn> that becomes the final bytecode. The emitter modules call methods on ProgramBuilder to append instructions like OP_OpenRead or OP_Next without manually tracking register numbers.

How does Turso handle complex statements like JOINs during bytecode generation?

The planner analyzes JOINs in core/translate/planner.rs, determining optimal join order and strategies (hash vs. nested-loop). During emission, emit_program_for_select in core/translate/emitter/select.rs opens separate cursors for each table in the join, generates nested loop constructs using LoopLabels, and emits comparison instructions to evaluate join conditions. The VDBE executes these as coordinated cursor movements through multiple table sources.

Where does query optimization occur in the Turso compilation pipeline?

Optimization occurs during the planning stage, after AST construction but before bytecode emission. The core/translate/optimizer/ directory contains passes that rewrite the logical Plan—for example, pushing predicates down to reduce scanned rows, flattening subqueries into joins, and selecting appropriate join algorithms. These transformations ensure the emitter produces efficient bytecode sequences without redundant operations.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →