How Change Data Capture (CDC) Works in Turso: A Deep Dive into the PRAGMA-Driven Architecture

Turso implements Change Data Capture (CDC) through a PRAGMA-driven mechanism that records data-modifying operations into a system-managed change log, supporting multiple capture modes from primary-key tracking to full before/after image serialization.

Turso, the open-source SQLite fork optimized for edge computing, provides a built-in Change Data Capture (CDC) mechanism that allows applications to audit every data-modifying operation without external tooling. The implementation leverages SQLite's PRAGMA system to activate CDC at the connection level, generating an immutable stream of changes that can power event-driven architectures, data replication, and compliance auditing.

Enabling CDC with PRAGMA capture_data_changes_conn

CDC activation in Turso centers on the capture_data_changes_conn PRAGMA, defined in sync/engine/src/database_tape.rs as CDC_PRAGMA_NAME. When executed, this PRAGMA initializes the CDC infrastructure by creating two system tables: turso_cdc (the default change log) and turso_cdc_version (schema version tracking).

The PRAGMA accepts multiple modes that determine the granularity of captured data:

  • off – Disables CDC for the connection.
  • id – Captures only the primary key (id) of changed rows.
  • before – Stores the complete before image of each changed row as a serialized blob.
  • after – Stores the complete after image of each changed row as a serialized blob.
  • full – Captures both before and after images plus a column-wise delta in the updates field.
  • id,custom_cdc – Captures only primary keys but writes to a user-defined table instead of turso_cdc.

Internal Architecture and Code Flow

Turso's CDC implementation spans the query translation layer through to the virtual database engine (VDBE), ensuring minimal overhead by injecting capture logic during compilation rather than runtime interception.

Connection State Management

In core/connection.rs, each connection maintains an RwLock<Option<CaptureDataChangesInfo>> that stores the active CDC configuration. The method Connection::set_capture_data_changes_info() activates CDC by populating this lock with a struct encoding the selected mode and target table name. During query execution, Connection::get_capture_data_changes_info() checks this state to determine if change emission is required.

PRAGMA Parsing and Activation

The core/translate/pragma.rs module handles parsing of capture_data_changes_conn. When the PRAGMA is invoked, Turso creates the CaptureDataChangesInfo struct and initializes the CDC tables. If turso_cdc_version already exists, Turso validates the stored version against CDC_VERSION_CURRENT (currently v2). The system refuses to enable CDC if an incompatible older schema is detected, preventing data corruption during version upgrades.

Query Compilation and Bytecode Injection

During query planning in core/translate/* (specifically insert.rs, update.rs, delete.rs, and schema.rs), the planner injects CDC-specific virtual registers when data modification is detected. These registers expose the capture mode through helper functions like has_before(), has_after(), and has_updates(), allowing the bytecode generator to conditionally emit instructions for serializing row images only when the mode requires them.

Runtime Execution and Change Emission

The actual writing to CDC tables occurs in core/vdbe/execute.rs. When the VDBE executes a modifying statement, it checks get_capture_data_changes_info(). If active, the generated bytecode writes a structured row to the CDC table containing:

  • change_id – Auto-incremented identifier for ordering.
  • change_time – Logical timestamp of the operation.
  • change_type – Integer code: 1 (insert), 0 (update), -1 (delete), or 2 (commit).
  • table_name and id – Target table and primary key.
  • before and after – Serialized row images (present based on mode).
  • updates – Binary encoding of column deltas (only in full mode).

A commit row (change_type = 2) is automatically emitted after each statement in auto-commit mode, or once at transaction end for explicit transactions.

CDC Table Schema and Version Management

The turso_cdc table stores serialized change data with a flexible schema that accommodates all capture modes. The accompanying turso_cdc_version table tracks the CDC schema version (currently v2) using the tuple (table_name, version). This versioning strategy allows Turso to maintain backward compatibility while evolving the CDC format, though the system currently enforces v2 and rejects activation attempts against legacy schemas.

Custom CDC tables follow the identical schema structure but reside in user-specified tables (e.g., custom_cdc), enabling application-specific partitioning of change logs while using the same underlying capture mechanism.

Practical Implementation Examples

The following Rust example demonstrates enabling CDC in full mode and consuming the change stream:

// Enable CDC in “full” mode (captures before/after images and updates)
conn.execute("PRAGMA capture_data_changes_conn('full')").unwrap();

// Insert rows – CDC rows are automatically generated
conn.execute("CREATE TABLE t (x INTEGER PRIMARY KEY, y)").unwrap();
conn.execute("INSERT INTO t VALUES (1, 2), (3, 4)").unwrap();

// Query the CDC stream
let cdc_rows = limbo_exec_rows(&conn, "SELECT * FROM turso_cdc");

// Switch to a custom CDC table that stores only primary keys
conn.execute("PRAGMA capture_data_changes_conn('id,custom_cdc')").unwrap();
conn.execute("INSERT INTO t VALUES (5, 5)").unwrap();
let custom = limbo_exec_rows(&conn, "SELECT * FROM custom_cdc");

These patterns are validated in tests/integration/functions/test_cdc.rs, which covers all CDC modes, custom table configurations, and version upgrade scenarios.

Summary

  • PRAGMA-driven activation – CDC is enabled per-connection via capture_data_changes_conn, creating turso_cdc and turso_cdc_version tables automatically.
  • Flexible capture modes – Options range from id-only tracking to full mode capturing before/after images and column deltas.
  • Compile-time injection – The query planner in core/translate/* injects CDC bytecode only when necessary, minimizing runtime overhead.
  • Version-safe upgrades – CDC schema versioning in turso_cdc_version prevents activation against incompatible legacy tables.
  • Custom table support – The id,custom_table syntax allows directing CDC output to application-specific tables while maintaining identical capture semantics.

Frequently Asked Questions

What PRAGMA command enables Change Data Capture in Turso?

Execute PRAGMA capture_data_changes_conn('mode') where mode is one of off, id, before, after, full, or id,custom_table. This PRAGMA, defined in the source as CDC_PRAGMA_NAME, initializes the CDC infrastructure and begins capturing changes for the current connection.

What data does Turso's CDC capture in 'full' mode?

In full mode, Turso captures the complete before and after images of each changed row as serialized blobs, plus an updates field containing a binary encoding of column-wise deltas. This mode provides the most comprehensive audit trail but requires the most storage space compared to id or single-image modes.

How does Turso handle version upgrades for CDC tables?

Turso stores the CDC schema version (currently v2) in the turso_cdc_version table. When enabling CDC, the system checks this version against CDC_VERSION_CURRENT. If an older version is detected, Turso refuses to enable CDC to prevent format incompatibilities, requiring manual migration or table recreation.

Can I specify a custom table name for CDC logs in Turso?

Yes. Append your custom table name to the mode parameter using the syntax PRAGMA capture_data_changes_conn('id,my_custom_table'). Turso will create my_custom_table with the same schema as turso_cdc and direct all CDC output there, allowing isolated change streams for different applications or audit scopes.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →