How Change Data Capture (CDC) Works in Turso: A Deep Dive into the PRAGMA-Driven Architecture
Turso implements Change Data Capture (CDC) through a PRAGMA-driven mechanism that records data-modifying operations into a system-managed change log, supporting multiple capture modes from primary-key tracking to full before/after image serialization.
Turso, the open-source SQLite fork optimized for edge computing, provides a built-in Change Data Capture (CDC) mechanism that allows applications to audit every data-modifying operation without external tooling. The implementation leverages SQLite's PRAGMA system to activate CDC at the connection level, generating an immutable stream of changes that can power event-driven architectures, data replication, and compliance auditing.
Enabling CDC with PRAGMA capture_data_changes_conn
CDC activation in Turso centers on the capture_data_changes_conn PRAGMA, defined in sync/engine/src/database_tape.rs as CDC_PRAGMA_NAME. When executed, this PRAGMA initializes the CDC infrastructure by creating two system tables: turso_cdc (the default change log) and turso_cdc_version (schema version tracking).
The PRAGMA accepts multiple modes that determine the granularity of captured data:
off– Disables CDC for the connection.id– Captures only the primary key (id) of changed rows.before– Stores the complete before image of each changed row as a serialized blob.after– Stores the complete after image of each changed row as a serialized blob.full– Captures bothbeforeandafterimages plus a column-wise delta in theupdatesfield.id,custom_cdc– Captures only primary keys but writes to a user-defined table instead ofturso_cdc.
Internal Architecture and Code Flow
Turso's CDC implementation spans the query translation layer through to the virtual database engine (VDBE), ensuring minimal overhead by injecting capture logic during compilation rather than runtime interception.
Connection State Management
In core/connection.rs, each connection maintains an RwLock<Option<CaptureDataChangesInfo>> that stores the active CDC configuration. The method Connection::set_capture_data_changes_info() activates CDC by populating this lock with a struct encoding the selected mode and target table name. During query execution, Connection::get_capture_data_changes_info() checks this state to determine if change emission is required.
PRAGMA Parsing and Activation
The core/translate/pragma.rs module handles parsing of capture_data_changes_conn. When the PRAGMA is invoked, Turso creates the CaptureDataChangesInfo struct and initializes the CDC tables. If turso_cdc_version already exists, Turso validates the stored version against CDC_VERSION_CURRENT (currently v2). The system refuses to enable CDC if an incompatible older schema is detected, preventing data corruption during version upgrades.
Query Compilation and Bytecode Injection
During query planning in core/translate/* (specifically insert.rs, update.rs, delete.rs, and schema.rs), the planner injects CDC-specific virtual registers when data modification is detected. These registers expose the capture mode through helper functions like has_before(), has_after(), and has_updates(), allowing the bytecode generator to conditionally emit instructions for serializing row images only when the mode requires them.
Runtime Execution and Change Emission
The actual writing to CDC tables occurs in core/vdbe/execute.rs. When the VDBE executes a modifying statement, it checks get_capture_data_changes_info(). If active, the generated bytecode writes a structured row to the CDC table containing:
change_id– Auto-incremented identifier for ordering.change_time– Logical timestamp of the operation.change_type– Integer code:1(insert),0(update),-1(delete), or2(commit).table_nameandid– Target table and primary key.beforeandafter– Serialized row images (present based on mode).updates– Binary encoding of column deltas (only infullmode).
A commit row (change_type = 2) is automatically emitted after each statement in auto-commit mode, or once at transaction end for explicit transactions.
CDC Table Schema and Version Management
The turso_cdc table stores serialized change data with a flexible schema that accommodates all capture modes. The accompanying turso_cdc_version table tracks the CDC schema version (currently v2) using the tuple (table_name, version). This versioning strategy allows Turso to maintain backward compatibility while evolving the CDC format, though the system currently enforces v2 and rejects activation attempts against legacy schemas.
Custom CDC tables follow the identical schema structure but reside in user-specified tables (e.g., custom_cdc), enabling application-specific partitioning of change logs while using the same underlying capture mechanism.
Practical Implementation Examples
The following Rust example demonstrates enabling CDC in full mode and consuming the change stream:
// Enable CDC in “full” mode (captures before/after images and updates)
conn.execute("PRAGMA capture_data_changes_conn('full')").unwrap();
// Insert rows – CDC rows are automatically generated
conn.execute("CREATE TABLE t (x INTEGER PRIMARY KEY, y)").unwrap();
conn.execute("INSERT INTO t VALUES (1, 2), (3, 4)").unwrap();
// Query the CDC stream
let cdc_rows = limbo_exec_rows(&conn, "SELECT * FROM turso_cdc");
// Switch to a custom CDC table that stores only primary keys
conn.execute("PRAGMA capture_data_changes_conn('id,custom_cdc')").unwrap();
conn.execute("INSERT INTO t VALUES (5, 5)").unwrap();
let custom = limbo_exec_rows(&conn, "SELECT * FROM custom_cdc");
These patterns are validated in tests/integration/functions/test_cdc.rs, which covers all CDC modes, custom table configurations, and version upgrade scenarios.
Summary
- PRAGMA-driven activation – CDC is enabled per-connection via
capture_data_changes_conn, creatingturso_cdcandturso_cdc_versiontables automatically. - Flexible capture modes – Options range from
id-only tracking tofullmode capturing before/after images and column deltas. - Compile-time injection – The query planner in
core/translate/*injects CDC bytecode only when necessary, minimizing runtime overhead. - Version-safe upgrades – CDC schema versioning in
turso_cdc_versionprevents activation against incompatible legacy tables. - Custom table support – The
id,custom_tablesyntax allows directing CDC output to application-specific tables while maintaining identical capture semantics.
Frequently Asked Questions
What PRAGMA command enables Change Data Capture in Turso?
Execute PRAGMA capture_data_changes_conn('mode') where mode is one of off, id, before, after, full, or id,custom_table. This PRAGMA, defined in the source as CDC_PRAGMA_NAME, initializes the CDC infrastructure and begins capturing changes for the current connection.
What data does Turso's CDC capture in 'full' mode?
In full mode, Turso captures the complete before and after images of each changed row as serialized blobs, plus an updates field containing a binary encoding of column-wise deltas. This mode provides the most comprehensive audit trail but requires the most storage space compared to id or single-image modes.
How does Turso handle version upgrades for CDC tables?
Turso stores the CDC schema version (currently v2) in the turso_cdc_version table. When enabling CDC, the system checks this version against CDC_VERSION_CURRENT. If an older version is detected, Turso refuses to enable CDC to prevent format incompatibilities, requiring manual migration or table recreation.
Can I specify a custom table name for CDC logs in Turso?
Yes. Append your custom table name to the mode parameter using the syntax PRAGMA capture_data_changes_conn('id,my_custom_table'). Turso will create my_custom_table with the same schema as turso_cdc and direct all CDC output there, allowing isolated change streams for different applications or audit scopes.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →