# How Change Data Capture (CDC) Works in Turso: A Deep Dive into the PRAGMA-Driven Architecture

> Discover how Turso’s PRAGMA-driven architecture enables efficient Change Data Capture CDC recording database modifications for various capture modes.

- Repository: [Turso Database/turso](https://github.com/tursodatabase/turso)
- Tags: deep-dive
- Published: 2026-06-22

---

**Turso implements Change Data Capture (CDC) through a PRAGMA-driven mechanism that records data-modifying operations into a system-managed change log, supporting multiple capture modes from primary-key tracking to full before/after image serialization.**

Turso, the open-source SQLite fork optimized for edge computing, provides a built-in Change Data Capture (CDC) mechanism that allows applications to audit every data-modifying operation without external tooling. The implementation leverages SQLite's PRAGMA system to activate CDC at the connection level, generating an immutable stream of changes that can power event-driven architectures, data replication, and compliance auditing.

## Enabling CDC with PRAGMA capture_data_changes_conn

CDC activation in Turso centers on the `capture_data_changes_conn` PRAGMA, defined in [`sync/engine/src/database_tape.rs`](https://github.com/tursodatabase/turso/blob/main/sync/engine/src/database_tape.rs) as `CDC_PRAGMA_NAME`. When executed, this PRAGMA initializes the CDC infrastructure by creating two system tables: `turso_cdc` (the default change log) and `turso_cdc_version` (schema version tracking).

The PRAGMA accepts multiple modes that determine the granularity of captured data:

- **`off`** – Disables CDC for the connection.
- **`id`** – Captures only the primary key (`id`) of changed rows.
- **`before`** – Stores the complete *before* image of each changed row as a serialized blob.
- **`after`** – Stores the complete *after* image of each changed row as a serialized blob.
- **`full`** – Captures both `before` and `after` images plus a column-wise delta in the `updates` field.
- **`id,custom_cdc`** – Captures only primary keys but writes to a user-defined table instead of `turso_cdc`.

## Internal Architecture and Code Flow

Turso's CDC implementation spans the query translation layer through to the virtual database engine (VDBE), ensuring minimal overhead by injecting capture logic during compilation rather than runtime interception.

### Connection State Management

In [`core/connection.rs`](https://github.com/tursodatabase/turso/blob/main/core/connection.rs), each connection maintains an `RwLock<Option<CaptureDataChangesInfo>>` that stores the active CDC configuration. The method `Connection::set_capture_data_changes_info()` activates CDC by populating this lock with a struct encoding the selected mode and target table name. During query execution, `Connection::get_capture_data_changes_info()` checks this state to determine if change emission is required.

### PRAGMA Parsing and Activation

The [`core/translate/pragma.rs`](https://github.com/tursodatabase/turso/blob/main/core/translate/pragma.rs) module handles parsing of `capture_data_changes_conn`. When the PRAGMA is invoked, Turso creates the `CaptureDataChangesInfo` struct and initializes the CDC tables. If `turso_cdc_version` already exists, Turso validates the stored version against `CDC_VERSION_CURRENT` (currently `v2`). The system refuses to enable CDC if an incompatible older schema is detected, preventing data corruption during version upgrades.

### Query Compilation and Bytecode Injection

During query planning in `core/translate/*` (specifically [`insert.rs`](https://github.com/tursodatabase/turso/blob/main/insert.rs), [`update.rs`](https://github.com/tursodatabase/turso/blob/main/update.rs), [`delete.rs`](https://github.com/tursodatabase/turso/blob/main/delete.rs), and [`schema.rs`](https://github.com/tursodatabase/turso/blob/main/schema.rs)), the planner injects CDC-specific virtual registers when data modification is detected. These registers expose the capture mode through helper functions like `has_before()`, `has_after()`, and `has_updates()`, allowing the bytecode generator to conditionally emit instructions for serializing row images only when the mode requires them.

### Runtime Execution and Change Emission

The actual writing to CDC tables occurs in [`core/vdbe/execute.rs`](https://github.com/tursodatabase/turso/blob/main/core/vdbe/execute.rs). When the VDBE executes a modifying statement, it checks `get_capture_data_changes_info()`. If active, the generated bytecode writes a structured row to the CDC table containing:

- **`change_id`** – Auto-incremented identifier for ordering.
- **`change_time`** – Logical timestamp of the operation.
- **`change_type`** – Integer code: `1` (insert), `0` (update), `-1` (delete), or `2` (commit).
- **`table_name`** and **`id`** – Target table and primary key.
- **`before`** and **`after`** – Serialized row images (present based on mode).
- **`updates`** – Binary encoding of column deltas (only in `full` mode).

A commit row (`change_type = 2`) is automatically emitted after each statement in auto-commit mode, or once at transaction end for explicit transactions.

## CDC Table Schema and Version Management

The `turso_cdc` table stores serialized change data with a flexible schema that accommodates all capture modes. The accompanying `turso_cdc_version` table tracks the CDC schema version (currently `v2`) using the tuple `(table_name, version)`. This versioning strategy allows Turso to maintain backward compatibility while evolving the CDC format, though the system currently enforces `v2` and rejects activation attempts against legacy schemas.

Custom CDC tables follow the identical schema structure but reside in user-specified tables (e.g., `custom_cdc`), enabling application-specific partitioning of change logs while using the same underlying capture mechanism.

## Practical Implementation Examples

The following Rust example demonstrates enabling CDC in `full` mode and consuming the change stream:

```rust
// Enable CDC in “full” mode (captures before/after images and updates)
conn.execute("PRAGMA capture_data_changes_conn('full')").unwrap();

// Insert rows – CDC rows are automatically generated
conn.execute("CREATE TABLE t (x INTEGER PRIMARY KEY, y)").unwrap();
conn.execute("INSERT INTO t VALUES (1, 2), (3, 4)").unwrap();

// Query the CDC stream
let cdc_rows = limbo_exec_rows(&conn, "SELECT * FROM turso_cdc");

// Switch to a custom CDC table that stores only primary keys
conn.execute("PRAGMA capture_data_changes_conn('id,custom_cdc')").unwrap();
conn.execute("INSERT INTO t VALUES (5, 5)").unwrap();
let custom = limbo_exec_rows(&conn, "SELECT * FROM custom_cdc");

```

These patterns are validated in [`tests/integration/functions/test_cdc.rs`](https://github.com/tursodatabase/turso/blob/main/tests/integration/functions/test_cdc.rs), which covers all CDC modes, custom table configurations, and version upgrade scenarios.

## Summary

- **PRAGMA-driven activation** – CDC is enabled per-connection via `capture_data_changes_conn`, creating `turso_cdc` and `turso_cdc_version` tables automatically.
- **Flexible capture modes** – Options range from `id`-only tracking to `full` mode capturing before/after images and column deltas.
- **Compile-time injection** – The query planner in `core/translate/*` injects CDC bytecode only when necessary, minimizing runtime overhead.
- **Version-safe upgrades** – CDC schema versioning in `turso_cdc_version` prevents activation against incompatible legacy tables.
- **Custom table support** – The `id,custom_table` syntax allows directing CDC output to application-specific tables while maintaining identical capture semantics.

## Frequently Asked Questions

### What PRAGMA command enables Change Data Capture in Turso?

Execute `PRAGMA capture_data_changes_conn('mode')` where mode is one of `off`, `id`, `before`, `after`, `full`, or `id,custom_table`. This PRAGMA, defined in the source as `CDC_PRAGMA_NAME`, initializes the CDC infrastructure and begins capturing changes for the current connection.

### What data does Turso's CDC capture in 'full' mode?

In `full` mode, Turso captures the complete before and after images of each changed row as serialized blobs, plus an `updates` field containing a binary encoding of column-wise deltas. This mode provides the most comprehensive audit trail but requires the most storage space compared to `id` or single-image modes.

### How does Turso handle version upgrades for CDC tables?

Turso stores the CDC schema version (currently `v2`) in the `turso_cdc_version` table. When enabling CDC, the system checks this version against `CDC_VERSION_CURRENT`. If an older version is detected, Turso refuses to enable CDC to prevent format incompatibilities, requiring manual migration or table recreation.

### Can I specify a custom table name for CDC logs in Turso?

Yes. Append your custom table name to the mode parameter using the syntax `PRAGMA capture_data_changes_conn('id,my_custom_table')`. Turso will create `my_custom_table` with the same schema as `turso_cdc` and direct all CDC output there, allowing isolated change streams for different applications or audit scopes.