Architecture of Turso's Sync Engine for Turso Cloud: A Deep Dive into the Rust Implementation
Turso's sync engine is a modular Rust implementation that provides full-duplex, conflict-free synchronization between local SQLite databases and Turso Cloud through a layered architecture separating orchestration, operations, and I/O concerns.
The sync engine lives in the sync/ workspace of the tursodatabase/turso repository and serves as the replication backbone for Turso Cloud instances. Written in pure Rust, it coordinates the bidirectional flow of Write-Ahead Log (WAL) frames between embedded SQLite instances and remote Turso Cloud databases, ensuring offline-first capabilities with seamless cloud synchronization.
Sync Engine Architecture Overview
The architecture of Turso's sync engine deliberately separates concerns into three orthogonal layers to maximize testability and language interoperability:
- Engine Core: Orchestrates the sync lifecycle including bootstrap, pull, push, and checkpoint operations via
DatabaseSyncEngineinsync/engine/src/database_sync_engine.rs. - Operations Layer: Implements low-level primitives for HTTP API communication, WAL frame manipulation, and CDC log management in
sync/engine/src/database_sync_operations.rs. - I/O Abstraction: Provides an injectable backend for HTTP and filesystem operations through
SyncEngineIoinsync/sdk-kit/src/sync_engine_io.rs, enabling testability and multi-language bindings.
This separation allows the same core logic to power the Rust CLI, JavaScript SDK, and Python bindings without modification to the synchronization algorithms.
Core Components and Source Files
DatabaseSyncEngine: The Orchestrator
Located in sync/engine/src/database_sync_engine.rs, the DatabaseSyncEngine struct serves as the central control plane. It holds the I/O implementation (Arc<dyn turso_core::IO>), sync-specific I/O wrappers (SyncEngineIoStats<IO>), and file paths for the main database, WAL, revert database, and metadata files.
Key lifecycle methods include:
create_db(): Entry point used by the CLI and SDK bindings that reads the local metadata file (<db>-info), bootstraps empty databases, and initializesDatabaseStorage.pull_changes_from_remote(): Executeswait_changes_from_remote()followed byapply_changes_from_remote(), performing passive checkpoints and frame reconciliation.push_changes_to_remote(): Streams local CDC rows in batches viasend_push_batch(), posting to the HRANA/batchendpoint.checkpoint(): Consolidates the revert database by copying pages newer than the last checkpoint watermark and truncating the main WAL.
Configuration is handled through DatabaseSyncEngineOpts (lines 39-78), which specifies remote URL, encryption keys, batch sizes, and protocol versions.
SyncOperationCtx and Low-Level Primitives
The SyncOperationCtx struct in sync/engine/src/database_sync_operations.rs (lines 20-28) bundles execution context including the coroutine (coro: &Coro<Ctx>), I/O stats collector, and remote connection details. It provides the http convenience method (lines 45-62) that automatically injects the x-turso-encryption-key header for authenticated requests.
This file implements the heavy lifting for synchronization:
wal_pull_to_file_v1()(lines 293-376): Downloads remote WAL frames as protobuf-encoded pages.apply_changes_from_remote(): Re-applies remote frames viawal_apply_from_file()and replays local changes that occurred after the last pull.wal_push_http(): Streams local WAL frames to the/push-walendpoint.
I/O Abstraction Layer
The SyncEngineIo trait in sync/sdk-kit/src/sync_engine_io.rs decouples the engine from direct network access. The concrete implementation SyncEngineIoQueue<TBytes> queues requests (SyncEngineIoRequest) and completions (SyncEngineIoCompletion<TBytes>), allowing language bindings to handle actual HTTP and filesystem operations.
This design makes the engine testable via stubbed queues and language-agnostic—JavaScript bindings in bindings/javascript/sync/src/lib.rs pop items from this queue and forward them to the host environment's native networking stacks.
Types and Metadata Management
sync/engine/src/types.rs defines the data structures flowing between layers:
DatabaseMetadata: JSON-serialized state stored in<db>-infofiles, tracking client UID, last synced revision, and configuration.DatabasePullRevision: SupportsLegacy(generation + frame number) andV1(revision string) protocol variants.SyncEngineStats: Atomic counters for CDC operations, WAL sizes, and network bytes.
High-Level Sync Control Flow
The synchronization process follows a deterministic five-stage pipeline:
-
Database Creation:
DatabaseSyncEngine::create_db()opens or creates the local database, initializes CDC tracking viamain_tape, and loads persisted metadata from the JSON metadata file. -
Bootstrap: For empty databases,
bootstrap_db()builds aSyncOperationCtxand callsdb_bootstrap_http()→wal_pull_to_file()to download the complete remote state as protobuf-encoded pages. -
Pull Cycle:
pull_changes_from_remote()waits for remote changes viawait_changes_from_remote(), then executesapply_changes_from_remote()which:- Runs
checkpoint_passive()to commit synced frames - Rolls back uncommitted local frames
- Re-applies remote frames via
wal_apply_from_file() - Replays local changes made during the sync window
- Runs
-
Push Cycle:
push_changes_to_remote()usesDatabaseReplayGeneratorto translate CDC rows into parameterized SQL statements, applies optional transformations viaapply_transformation()(lines 1198-1218), and posts batches to the remote/batchendpoint. -
Checkpoint: The
checkpoint()method copies newer pages from the main WAL to the revert database, ensuring the revert DB reflects a consistent rollback state, then truncates the main WAL accordingly.
WAL Handling and Replication Protocol
The engine operates directly on SQLite WAL frames, respecting WAL_FRAME_HEADER and WAL_FRAME_SIZE boundaries defined in database_sync_operations.rs.
Remote Frame Application: During pull operations, wal_pull_to_file() writes raw page bytes to temporary files before wal_apply_from_file() feeds them to a DatabaseWalSession. Frame boundaries are strictly validated through assertions on WAL_FRAME_SIZE.
Local Frame Capture: The push mechanism reads frames from wal_session::WalSession and streams them via HTTP POST to /push-wal, ensuring atomic batch delivery.
Conflict Resolution: The engine maintains a revert database—a shadow copy of the main database updated during checkpoints. If synchronization conflicts arise, the revert DB provides a consistent rollback point without corrupting the main database file.
Practical Implementation Examples
Initializing the Sync Engine in Rust
use turso_sync_engine::database_sync_engine::{DatabaseSyncEngine, DatabaseSyncEngineOpts};
use turso_core::IO;
use std::sync::Arc;
async fn start_sync(io: Arc<dyn IO>, db_path: &str) -> turso_sync_engine::Result<()> {
let opts = DatabaseSyncEngineOpts {
remote_url: Some("https://my.turso.cloud".into()),
client_name: "my-client".into(),
tables_ignore: vec!["sqlite_sequence".into()],
use_transform: false,
wal_pull_batch_size: 1000,
long_poll_timeout: Some(std::time::Duration::from_secs(30)),
protocol_version_hint: turso_sync_engine::types::DatabaseSyncEngineProtocolVersion::V1,
bootstrap_if_empty: true,
reserved_bytes: 0,
db_opts: Default::default(),
partial_sync_opts: None,
remote_encryption_key: None,
push_operations_threshold: None,
pull_bytes_threshold: None,
};
let sync_io = sync_sdk_kit::SyncEngineIoQueue::<Vec<u8>>::new();
let engine = DatabaseSyncEngine::create_db(&Coro::default(), io, sync_io, db_path, opts).await?;
engine.sync(&Coro::default()).await
}
Pulling Changes via JavaScript Bindings
import { SyncEngine } from "@tursodatabase/sync";
const engine = await SyncEngine.create({
dbPath: "/tmp/my.db",
remoteUrl: "https://my.turso.cloud",
clientName: "js-client",
protocolVersion: "v1",
bootstrapIfEmpty: true,
});
await engine.pullChanges();
The binding in bindings/javascript/sync/src/lib.rs forwards these calls to DatabaseSyncEngine::pull_changes_from_remote().
Implementing Custom Transformations
use turso_sync_engine::types::{DatabaseRowTransformResult, DatabaseTapeRowChange};
use turso_sync_engine::database_sync_operations::apply_transformation;
async fn filter_logs(
ctx: &SyncOperationCtx<'_, impl SyncEngineIo, MyCtx>,
changes: &[DatabaseTapeRowChange],
generator: &DatabaseReplayGenerator,
) -> turso_sync_engine::Result<Vec<DatabaseRowTransformResult>> {
let mut mutations = Vec::new();
for change in changes {
let info = generator.replay_info(ctx.coro, change).await?;
mutations.push(generator.create_mutation(&info, change)?);
}
let completion = ctx.io.transform(mutations)?;
let transformed = wait_all_results(ctx.coro, &completion, None).await?;
Ok(transformed
.into_iter()
.enumerate()
.map(|(idx, res)| {
if let DatabaseRowTransformResult::Keep = res {
if changes[idx].table_name == "logs"
&& matches!(changes[idx].change, DatabaseTapeRowChangeType::Insert { .. })
{
DatabaseRowTransformResult::Skip
} else {
DatabaseRowTransformResult::Keep
}
} else {
res
}
})
.collect())
}
Enable transformations by setting use_transform: true in DatabaseSyncEngineOpts.
Manual Checkpoint Operations
engine.checkpoint(&Coro::default()).await?;
This executes checkpoint_passive() and rebuilds the revert database from pages newer than the last checkpoint watermark (lines 876-917 in database_sync_engine.rs).
Summary
- Turso's sync engine uses a three-layer architecture (orchestration, operations, I/O) to synchronize SQLite databases with Turso Cloud while maintaining strict separation of concerns.
- Core files:
database_sync_engine.rscontrols the lifecycle,database_sync_operations.rshandles WAL frames and HTTP communication, andsync_engine_io.rsabstracts I/O for multi-language support. - Protocol support: The engine supports both legacy and V1 revision protocols, tracking state via JSON metadata files (
<db>-info) andDatabaseMetadatastructs. - WAL-based replication streams protobuf-encoded frames with strict boundary validation and maintains a revert database for atomic rollback safety.
- Language bindings leverage the
SyncEngineIoQueueabstraction to expose Rust core functionality to JavaScript, Python, and other environments without duplicating synchronization logic.
Frequently Asked Questions
How does Turso's sync engine handle conflict resolution?
The engine implements a last-write-wins strategy with deterministic conflict resolution. During the pull cycle in apply_changes_from_remote(), local uncommitted frames are rolled back before remote frames are applied. The revert database maintains a consistent pre-sync state via checkpoint() operations, allowing applications to roll back to a known good configuration if conflicts require manual intervention.
What is the role of the revert database in Turso sync?
The revert database serves as a shadow checkpoint of the main database. During checkpoint() operations in database_sync_engine.rs (lines 876-917), pages newer than the last watermark are copied from the main WAL to the revert database. This creates a consistent snapshot that can be restored without corrupting the main database file, providing atomic rollback capabilities during synchronization failures.
How does the I/O abstraction layer enable multi-language support?
The SyncEngineIo trait in sync/sdk-kit/src/sync_engine_io.rs defines a queue-based interface where requests (SyncEngineIoRequest) and completions (SyncEngineIoCompletion) are queued rather than executed directly. Language bindings in bindings/javascript/sync/src/lib.rs pop these queue items and execute them using the host environment's native networking and filesystem APIs, allowing the Rust core to remain platform-agnostic while supporting JavaScript, Python, and other languages.
What protocol does Turso use to synchronize WAL frames?
Turso uses a protobuf-based protocol over HTTP/HTTPS. The engine downloads remote frames via endpoints like /pull-updates and uploads local changes via /push-wal and the HRANA /batch endpoint. Frame data is encoded as PageData protobuf messages in wal_pull_to_file_v1() (lines 293-376), with the protocol version (Legacy or V1) determined by DatabaseSyncEngineOpts.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →