# Turso Change Data Capture (CDC): How It Tracks Database Changes in Real-Time

> Discover how Turso Change Data Capture (CDC) tracks real-time database changes. Learn how it instruments VDBE to log INSERT, UPDATE, and DELETE operations.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: internals
- Published: 2026-06-23

---

**Turso's CDC feature tracks real-time database changes by using the `capture_data_changes_conn` PRAGMA to instrument the VDBE (virtual machine) execution layer, writing INSERT, UPDATE, and DELETE operations into a dedicated `turso_cdc` table with versioned schemas and transaction ID grouping.**

Turso implements Change Data Capture by coupling a special PRAGMA with internal VDBE hooks that intercept data-modifying operations at the virtual-machine level. According to the `tursodatabase/turso` source code, this architecture ensures that every row mutation is persisted atomically with the transaction that produced it, enabling downstream replication and real-time analytics pipelines.

## Enabling CDC with the capture_data_changes_conn PRAGMA

### Activation and Schema Creation

CDC activation begins when a client executes:

```sql
PRAGMA capture_data_changes_conn('full')

```

This triggers the `op_init_cdc_version` function in [`core/vdbe/execute.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/execute.rs) (lines 12640–12674). The routine first checks for existing CDC infrastructure and creates two system tables if missing:

- **`turso_cdc`** – Stores the actual change records
- **`turso_cdc_version`** – Tracks which schema version (V1 or V2) the CDC table uses

The pragma supports multiple capture modes including `full`, `id`, `before`, `after`, and `off`. The requested mode is parsed into a `CaptureDataChangesInfo` struct and stored in the connection via `Connection::set_capture_data_changes_info` (defined in [`core/connection.rs`](https://github.com/tursodatabase/turso/blob/main/core/connection.rs), lines 380–384). CDC activation is deferred until the surrounding program commits successfully, ensuring that rolled-back transactions leave no trace in the CDC log.

### Version Selection and Legacy Compatibility

Turso maintains two CDC schemas:

- **V1** – Eight columns with no transaction ID support
- **V2** – Nine columns including `change_txn_id` for MVCC-aware grouping

When initializing CDC, Turso inspects `turso_cdc_version` to determine the existing schema. If a legacy V1 table exists, the connection automatically pins to V1 regardless of the requested version (see version selection logic in [`core/vdbe/execute.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/execute.rs), lines 12774–12802). New databases default to V2, which includes the `change_txn_id` column essential for transaction boundary detection.

## CDC Table Schema and Row Structure

### V2 Schema Layout

A standard V2 CDC row contains the following columns created by `op_init_cdc_version` in [`core/vdbe/execute.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/execute.rs) (lines 12755–12760):

| Column | Description |
|--------|-------------|
| `change_id` | Primary key (auto-increment or MVCC-assigned) |
| `change_time` | Unix-epoch timestamp of the change |
| `change_txn_id` | Transaction ID for grouping (V2 only) |
| `change_type` | Operation type: `1=INSERT`, `2=UPDATE`, `3=DELETE` |
| `table_name` | Name of the affected table |
| `id` | Primary-key value(s) of the modified row |
| `before` | BLOB containing the row image before the change (if mode includes `before`) |
| `after` | BLOB containing the row image after the change (if mode includes `after`) |
| `updates` | BLOB with JSON-encoded column changes (if mode includes `full`) |

The `before`, `after`, and `updates` columns are populated based on the capture mode specified in the PRAGMA. In `full` mode, all three are populated for UPDATE operations, providing complete before-and-after snapshots.

## Capturing DML Operations in Real-Time

During statement compilation, the translator inspects `program.capture_data_changes_info()` to determine whether CDC emission is required. The system instruments three primary DML paths:

### INSERT Path

In [`core/translate/insert.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/insert.rs) (lines 1077, 1271–1326), the compiler checks the capture mode to decide whether to include `after` images and/or `updates`. For each INSERT, it generates code that writes a CDC row with `change_type=1` via the `CDC_PRAGMA_NAME` cursor.

### UPDATE Path

The UPDATE implementation in [`core/translate/update.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/update.rs) (line 178) evaluates the capture mode to determine whether to record `before`, `after`, and/or `updates` fields. This allows fine-grained tracking of column-level changes without the overhead of full row serialization when only specific fields are needed.

### DELETE Path

For DELETE operations, [`core/translate/emitter/delete.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/emitter/delete.rs) (line 763) captures the `before` image when the mode requests it, writing a row with `change_type=3` to preserve the final state of the deleted record.

## Transaction Grouping and MVCC Integration

When MVCC is enabled, Turso maintains a per-connection `cdc_transaction_id` initialized to `-1`. All CDC rows generated within a single transaction share the same transaction ID, written to the `change_txn_id` column in V2 schemas. This enables downstream consumers—such as the Turso Cloud sync service—to batch changes atomically and maintain transactional consistency across distributed systems.

The transaction ID assignment occurs during VDBE execution, ensuring that CDC records inherit the same transactional boundaries as the underlying data changes.

## Querying and Consuming CDC Data

### Direct SQL Access

Applications query CDC changes directly through standard SQL:

```sql
SELECT change_id, 
       change_time, 
       change_type, 
       table_name, 
       id,
       json_extract(after, '$.name') AS new_name,
       json_extract(before, '$.name') AS old_name
FROM turso_cdc
WHERE table_name = 'users'
ORDER BY change_id;

```

This returns a chronological log of modifications with optional JSON extraction of column values from the serialized row images.

### Sync Engine Integration

Turso's sync engine consumes CDC through the `DatabaseTape` abstraction. Before replaying changes, the engine re-issues the CDC pragma using the `CDC_PRAGMA_NAME` constant (`capture_data_changes_conn`) as shown in [`sync/engine/src/database_tape.rs`](https://github.com/tursodatabase/turso/blob/main/sync/engine/src/database_tape.rs) (lines 150–180). This ensures the connection captures changes during sync operations, allowing the engine to stream modifications between edge instances and the cloud backend.

## Disabling CDC

To stop capturing changes without dropping historical data:

```sql
PRAGMA capture_data_changes_conn('off')

```

This clears the connection's `capture_data_changes` field, preventing subsequent DML statements from emitting CDC rows while preserving existing tables for later re-enablement (handled in [`core/vdbe/execute.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/execute.rs), lines 12661–12669).

## Summary

- **Turso CDC** activates via `PRAGMA capture_data_changes_conn`, creating versioned tables `turso_cdc` and `turso_cdc_version` through `op_init_cdc_version` in [`core/vdbe/execute.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/execute.rs).
- **Schema versioning** supports V1 (8 columns) and V2 (9 columns with transaction IDs), with automatic backward compatibility for legacy tables.
- **DML instrumentation** occurs at the VDBE layer in [`core/translate/insert.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/insert.rs), [`update.rs`](https://github.com/tursodatabase/turso/blob/main/update.rs), and [`emitter/delete.rs`](https://github.com/tursodatabase/turso/blob/main/emitter/delete.rs), emitting rows based on capture mode (`full`, `before`, `after`, `id`).
- **Transaction grouping** uses `change_txn_id` in V2 schemas to maintain atomic boundaries for downstream replication.
- **Consumption** supports direct SQL queries and programmatic access via the sync engine's `DatabaseTape` abstraction.

## Frequently Asked Questions

### What capture modes does Turso CDC support?

Turso supports `full`, `id`, `before`, `after`, and `off` modes. The `full` mode captures complete before-and-after row images plus a JSON diff, while `id` records only primary keys. The `before` and `after` modes allow selective capture of pre-change or post-change snapshots, respectively.

### Where does Turso store CDC data?

CDC data persists in a system table named `turso_cdc`, created automatically in the database file when CDC is first enabled. A companion table `turso_cdc_version` tracks whether the schema is V1 or V2, with V2 adding the `change_txn_id` column for transaction grouping.

### How does Turso CDC handle rolled-back transactions?

CDC emission is deferred until the transaction commits successfully. If a transaction rolls back, the VDBE never writes the pending CDC rows, ensuring the change log contains only committed data and maintains consistency with the actual database state.

### Can I use Turso CDC with the sync engine?

Yes. The sync engine explicitly enables CDC using the `capture_data_changes_conn` pragma in [`sync/engine/src/database_tape.rs`](https://github.com/tursodatabase/turso/blob/main/sync/engine/src/database_tape.rs) (lines 150–180). This allows the engine to capture local changes for upload to Turso Cloud and to apply remote changes while tracking them for local observers.