Turso Change Data Capture (CDC): How It Tracks Database Changes in Real-Time

Turso's CDC feature tracks real-time database changes by using the capture_data_changes_conn PRAGMA to instrument the VDBE (virtual machine) execution layer, writing INSERT, UPDATE, and DELETE operations into a dedicated turso_cdc table with versioned schemas and transaction ID grouping.

Turso implements Change Data Capture by coupling a special PRAGMA with internal VDBE hooks that intercept data-modifying operations at the virtual-machine level. According to the tursodatabase/turso source code, this architecture ensures that every row mutation is persisted atomically with the transaction that produced it, enabling downstream replication and real-time analytics pipelines.

Enabling CDC with the capture_data_changes_conn PRAGMA

Activation and Schema Creation

CDC activation begins when a client executes:

PRAGMA capture_data_changes_conn('full')

This triggers the op_init_cdc_version function in core/vdbe/execute.rs (lines 12640–12674). The routine first checks for existing CDC infrastructure and creates two system tables if missing:

  • turso_cdc – Stores the actual change records
  • turso_cdc_version – Tracks which schema version (V1 or V2) the CDC table uses

The pragma supports multiple capture modes including full, id, before, after, and off. The requested mode is parsed into a CaptureDataChangesInfo struct and stored in the connection via Connection::set_capture_data_changes_info (defined in core/connection.rs, lines 380–384). CDC activation is deferred until the surrounding program commits successfully, ensuring that rolled-back transactions leave no trace in the CDC log.

Version Selection and Legacy Compatibility

Turso maintains two CDC schemas:

  • V1 – Eight columns with no transaction ID support
  • V2 – Nine columns including change_txn_id for MVCC-aware grouping

When initializing CDC, Turso inspects turso_cdc_version to determine the existing schema. If a legacy V1 table exists, the connection automatically pins to V1 regardless of the requested version (see version selection logic in core/vdbe/execute.rs, lines 12774–12802). New databases default to V2, which includes the change_txn_id column essential for transaction boundary detection.

CDC Table Schema and Row Structure

V2 Schema Layout

A standard V2 CDC row contains the following columns created by op_init_cdc_version in core/vdbe/execute.rs (lines 12755–12760):

Column Description
change_id Primary key (auto-increment or MVCC-assigned)
change_time Unix-epoch timestamp of the change
change_txn_id Transaction ID for grouping (V2 only)
change_type Operation type: 1=INSERT, 2=UPDATE, 3=DELETE
table_name Name of the affected table
id Primary-key value(s) of the modified row
before BLOB containing the row image before the change (if mode includes before)
after BLOB containing the row image after the change (if mode includes after)
updates BLOB with JSON-encoded column changes (if mode includes full)

The before, after, and updates columns are populated based on the capture mode specified in the PRAGMA. In full mode, all three are populated for UPDATE operations, providing complete before-and-after snapshots.

Capturing DML Operations in Real-Time

During statement compilation, the translator inspects program.capture_data_changes_info() to determine whether CDC emission is required. The system instruments three primary DML paths:

INSERT Path

In core/translate/insert.rs (lines 1077, 1271–1326), the compiler checks the capture mode to decide whether to include after images and/or updates. For each INSERT, it generates code that writes a CDC row with change_type=1 via the CDC_PRAGMA_NAME cursor.

UPDATE Path

The UPDATE implementation in core/translate/update.rs (line 178) evaluates the capture mode to determine whether to record before, after, and/or updates fields. This allows fine-grained tracking of column-level changes without the overhead of full row serialization when only specific fields are needed.

DELETE Path

For DELETE operations, core/translate/emitter/delete.rs (line 763) captures the before image when the mode requests it, writing a row with change_type=3 to preserve the final state of the deleted record.

Transaction Grouping and MVCC Integration

When MVCC is enabled, Turso maintains a per-connection cdc_transaction_id initialized to -1. All CDC rows generated within a single transaction share the same transaction ID, written to the change_txn_id column in V2 schemas. This enables downstream consumers—such as the Turso Cloud sync service—to batch changes atomically and maintain transactional consistency across distributed systems.

The transaction ID assignment occurs during VDBE execution, ensuring that CDC records inherit the same transactional boundaries as the underlying data changes.

Querying and Consuming CDC Data

Direct SQL Access

Applications query CDC changes directly through standard SQL:

SELECT change_id, 
       change_time, 
       change_type, 
       table_name, 
       id,
       json_extract(after, '$.name') AS new_name,
       json_extract(before, '$.name') AS old_name
FROM turso_cdc
WHERE table_name = 'users'
ORDER BY change_id;

This returns a chronological log of modifications with optional JSON extraction of column values from the serialized row images.

Sync Engine Integration

Turso's sync engine consumes CDC through the DatabaseTape abstraction. Before replaying changes, the engine re-issues the CDC pragma using the CDC_PRAGMA_NAME constant (capture_data_changes_conn) as shown in sync/engine/src/database_tape.rs (lines 150–180). This ensures the connection captures changes during sync operations, allowing the engine to stream modifications between edge instances and the cloud backend.

Disabling CDC

To stop capturing changes without dropping historical data:

PRAGMA capture_data_changes_conn('off')

This clears the connection's capture_data_changes field, preventing subsequent DML statements from emitting CDC rows while preserving existing tables for later re-enablement (handled in core/vdbe/execute.rs, lines 12661–12669).

Summary

  • Turso CDC activates via PRAGMA capture_data_changes_conn, creating versioned tables turso_cdc and turso_cdc_version through op_init_cdc_version in core/vdbe/execute.rs.
  • Schema versioning supports V1 (8 columns) and V2 (9 columns with transaction IDs), with automatic backward compatibility for legacy tables.
  • DML instrumentation occurs at the VDBE layer in core/translate/insert.rs, update.rs, and emitter/delete.rs, emitting rows based on capture mode (full, before, after, id).
  • Transaction grouping uses change_txn_id in V2 schemas to maintain atomic boundaries for downstream replication.
  • Consumption supports direct SQL queries and programmatic access via the sync engine's DatabaseTape abstraction.

Frequently Asked Questions

What capture modes does Turso CDC support?

Turso supports full, id, before, after, and off modes. The full mode captures complete before-and-after row images plus a JSON diff, while id records only primary keys. The before and after modes allow selective capture of pre-change or post-change snapshots, respectively.

Where does Turso store CDC data?

CDC data persists in a system table named turso_cdc, created automatically in the database file when CDC is first enabled. A companion table turso_cdc_version tracks whether the schema is V1 or V2, with V2 adding the change_txn_id column for transaction grouping.

How does Turso CDC handle rolled-back transactions?

CDC emission is deferred until the transaction commits successfully. If a transaction rolls back, the VDBE never writes the pending CDC rows, ensuring the change log contains only committed data and maintains consistency with the actual database state.

Can I use Turso CDC with the sync engine?

Yes. The sync engine explicitly enables CDC using the capture_data_changes_conn pragma in sync/engine/src/database_tape.rs (lines 150–180). This allows the engine to capture local changes for upload to Turso Cloud and to apply remote changes while tracking them for local observers.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →