Turso Change Data Capture (CDC): How It Tracks Database Changes in Real-Time
Turso's CDC feature tracks real-time database changes by using the capture_data_changes_conn PRAGMA to instrument the VDBE (virtual machine) execution layer, writing INSERT, UPDATE, and DELETE operations into a dedicated turso_cdc table with versioned schemas and transaction ID grouping.
Turso implements Change Data Capture by coupling a special PRAGMA with internal VDBE hooks that intercept data-modifying operations at the virtual-machine level. According to the tursodatabase/turso source code, this architecture ensures that every row mutation is persisted atomically with the transaction that produced it, enabling downstream replication and real-time analytics pipelines.
Enabling CDC with the capture_data_changes_conn PRAGMA
Activation and Schema Creation
CDC activation begins when a client executes:
PRAGMA capture_data_changes_conn('full')
This triggers the op_init_cdc_version function in core/vdbe/execute.rs (lines 12640–12674). The routine first checks for existing CDC infrastructure and creates two system tables if missing:
turso_cdc– Stores the actual change recordsturso_cdc_version– Tracks which schema version (V1 or V2) the CDC table uses
The pragma supports multiple capture modes including full, id, before, after, and off. The requested mode is parsed into a CaptureDataChangesInfo struct and stored in the connection via Connection::set_capture_data_changes_info (defined in core/connection.rs, lines 380–384). CDC activation is deferred until the surrounding program commits successfully, ensuring that rolled-back transactions leave no trace in the CDC log.
Version Selection and Legacy Compatibility
Turso maintains two CDC schemas:
- V1 – Eight columns with no transaction ID support
- V2 – Nine columns including
change_txn_idfor MVCC-aware grouping
When initializing CDC, Turso inspects turso_cdc_version to determine the existing schema. If a legacy V1 table exists, the connection automatically pins to V1 regardless of the requested version (see version selection logic in core/vdbe/execute.rs, lines 12774–12802). New databases default to V2, which includes the change_txn_id column essential for transaction boundary detection.
CDC Table Schema and Row Structure
V2 Schema Layout
A standard V2 CDC row contains the following columns created by op_init_cdc_version in core/vdbe/execute.rs (lines 12755–12760):
| Column | Description |
|---|---|
change_id |
Primary key (auto-increment or MVCC-assigned) |
change_time |
Unix-epoch timestamp of the change |
change_txn_id |
Transaction ID for grouping (V2 only) |
change_type |
Operation type: 1=INSERT, 2=UPDATE, 3=DELETE |
table_name |
Name of the affected table |
id |
Primary-key value(s) of the modified row |
before |
BLOB containing the row image before the change (if mode includes before) |
after |
BLOB containing the row image after the change (if mode includes after) |
updates |
BLOB with JSON-encoded column changes (if mode includes full) |
The before, after, and updates columns are populated based on the capture mode specified in the PRAGMA. In full mode, all three are populated for UPDATE operations, providing complete before-and-after snapshots.
Capturing DML Operations in Real-Time
During statement compilation, the translator inspects program.capture_data_changes_info() to determine whether CDC emission is required. The system instruments three primary DML paths:
INSERT Path
In core/translate/insert.rs (lines 1077, 1271–1326), the compiler checks the capture mode to decide whether to include after images and/or updates. For each INSERT, it generates code that writes a CDC row with change_type=1 via the CDC_PRAGMA_NAME cursor.
UPDATE Path
The UPDATE implementation in core/translate/update.rs (line 178) evaluates the capture mode to determine whether to record before, after, and/or updates fields. This allows fine-grained tracking of column-level changes without the overhead of full row serialization when only specific fields are needed.
DELETE Path
For DELETE operations, core/translate/emitter/delete.rs (line 763) captures the before image when the mode requests it, writing a row with change_type=3 to preserve the final state of the deleted record.
Transaction Grouping and MVCC Integration
When MVCC is enabled, Turso maintains a per-connection cdc_transaction_id initialized to -1. All CDC rows generated within a single transaction share the same transaction ID, written to the change_txn_id column in V2 schemas. This enables downstream consumers—such as the Turso Cloud sync service—to batch changes atomically and maintain transactional consistency across distributed systems.
The transaction ID assignment occurs during VDBE execution, ensuring that CDC records inherit the same transactional boundaries as the underlying data changes.
Querying and Consuming CDC Data
Direct SQL Access
Applications query CDC changes directly through standard SQL:
SELECT change_id,
change_time,
change_type,
table_name,
id,
json_extract(after, '$.name') AS new_name,
json_extract(before, '$.name') AS old_name
FROM turso_cdc
WHERE table_name = 'users'
ORDER BY change_id;
This returns a chronological log of modifications with optional JSON extraction of column values from the serialized row images.
Sync Engine Integration
Turso's sync engine consumes CDC through the DatabaseTape abstraction. Before replaying changes, the engine re-issues the CDC pragma using the CDC_PRAGMA_NAME constant (capture_data_changes_conn) as shown in sync/engine/src/database_tape.rs (lines 150–180). This ensures the connection captures changes during sync operations, allowing the engine to stream modifications between edge instances and the cloud backend.
Disabling CDC
To stop capturing changes without dropping historical data:
PRAGMA capture_data_changes_conn('off')
This clears the connection's capture_data_changes field, preventing subsequent DML statements from emitting CDC rows while preserving existing tables for later re-enablement (handled in core/vdbe/execute.rs, lines 12661–12669).
Summary
- Turso CDC activates via
PRAGMA capture_data_changes_conn, creating versioned tablesturso_cdcandturso_cdc_versionthroughop_init_cdc_versionincore/vdbe/execute.rs. - Schema versioning supports V1 (8 columns) and V2 (9 columns with transaction IDs), with automatic backward compatibility for legacy tables.
- DML instrumentation occurs at the VDBE layer in
core/translate/insert.rs,update.rs, andemitter/delete.rs, emitting rows based on capture mode (full,before,after,id). - Transaction grouping uses
change_txn_idin V2 schemas to maintain atomic boundaries for downstream replication. - Consumption supports direct SQL queries and programmatic access via the sync engine's
DatabaseTapeabstraction.
Frequently Asked Questions
What capture modes does Turso CDC support?
Turso supports full, id, before, after, and off modes. The full mode captures complete before-and-after row images plus a JSON diff, while id records only primary keys. The before and after modes allow selective capture of pre-change or post-change snapshots, respectively.
Where does Turso store CDC data?
CDC data persists in a system table named turso_cdc, created automatically in the database file when CDC is first enabled. A companion table turso_cdc_version tracks whether the schema is V1 or V2, with V2 adding the change_txn_id column for transaction grouping.
How does Turso CDC handle rolled-back transactions?
CDC emission is deferred until the transaction commits successfully. If a transaction rolls back, the VDBE never writes the pending CDC rows, ensuring the change log contains only committed data and maintains consistency with the actual database state.
Can I use Turso CDC with the sync engine?
Yes. The sync engine explicitly enables CDC using the capture_data_changes_conn pragma in sync/engine/src/database_tape.rs (lines 150–180). This allows the engine to capture local changes for upload to Turso Cloud and to apply remote changes while tracking them for local observers.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →