How zvec's Write-Ahead Log (WAL) Ensures Data Durability
zvec guarantees durability by appending every mutation to a crash-recoverable WAL with CRC checksums before applying changes to in-memory structures, ensuring no committed data is lost even after system crashes.
The Alibaba zvec vector database engine relies on a robust write-ahead log (WAL) mechanism to provide strong durability guarantees for every document mutation. Before any INSERT, UPDATE, UPSERT, or DELETE operation becomes visible to readers, zvec serializes the change to an append-only WAL file protected by CRC32C checksums. This design ensures that even if the system crashes immediately after a write, the committed data can be fully recovered during the next startup.
Core WAL Architecture and File Structure
WAL File Creation and Path Management
Every segment in zvec maintains its own dedicated WAL file. When a segment initializes its write-ahead log, SegmentImpl::open_wal_file() invokes WalFile::Create to instantiate a LocalWalFile. The file path is constructed via FileHelper::MakeWalPath, generating a file named *.wal in the segment's directory.
This separation ensures that WAL files are never overwritten in-place—only new records are appended—providing a linear history of mutations that can be replayed sequentially.
Record Format and CRC Protection
Each mutation is wrapped in a WalRecord structure before hitting the disk. In LocalWalFile::append, the system constructs a record containing three fields: length (payload size), CRC32C checksum, and payload (the serialized document bytes).
The CRC is computed using ailego::Crc32c::Hash over the payload data. This checksum serves as a corruption detector during recovery—if a bit-rot or partial write occurs, the CRC validation fails and the recovery process stops before applying the damaged record, preventing data loss.
Atomic Append and Thread Safety
Concurrent mutations to the same segment are serialized through a mutex guard. Inside LocalWalFile::append, the actual disk write occurs within a std::lock_guard<std::mutex> scope that protects the write_record method.
This locking ensures that multiple threads appending to the same WAL file cannot interleave their records, maintaining the sequential integrity required for deterministic crash recovery. Each append operation is atomic from the perspective of the WAL—either the entire record (length + CRC + payload) is written, or none of it is.
Flush Policies and OS-Level Persistence
Durability is configurable through WalOptions.max_docs_wal_flush. This parameter controls how many documents can be appended before the system forces an fsync operation via file_.flush().
When the append count reaches the configured threshold, LocalWalFile::append automatically triggers a flush, forcing the operating system to push buffered data to the storage device. Users can also explicitly call wal_file_->flush() for synchronous durability on critical writes. This turns in-memory buffers into durable storage, ensuring that committed transactions survive a power loss.
Crash Recovery and Replay Mechanism
Recovery Initialization
When a segment restarts after a crash, SegmentImpl::recover re-opens the existing WAL file via SegmentImpl::open_wal_file. If the file exists, the system prepares it for sequential reading using prepare_for_read, which validates the WalHeader (containing version information) to ensure format compatibility.
Sequential Replay
The recovery process iterates through the WAL using wal_file_->next(), which reads each WalRecord, verifies its CRC, and returns the payload. For each valid record, the system deserializes the document using Doc::deserialize and re-applies the operation (INSERT, UPDATE, UPSERT, or DELETE) to the in-memory structures exactly as during normal processing.
If a corrupted record is encountered—indicated by a CRC mismatch or incomplete read—the next() method returns an empty string, causing the replay loop to break. All preceding valid records have already been applied, ensuring that only complete, verified mutations are recovered.
WAL Cleanup
Once a segment has been fully persisted to immutable block files (vector and scalar data flushed to disk), the WAL is no longer needed. LocalWalFile::remove deletes the *.wal file, freeing storage space while maintaining the durable state in the compacted segment files.
Code Examples
Low-Level WAL Creation and Usage
#include "db/index/storage/wal/wal_file.h"
#include "db/common/file_helper.h"
using namespace zvec;
void low_level_wal_demo() {
// Build a WAL file path: <collection_dir>/<seg_id>/<block_id>.wal
std::string wal_path = FileHelper::MakeWalPath("./data", 0, 0);
WalFilePtr wal = WalFile::Create(wal_path);
WalOptions opt;
opt.create_new = true; // fresh WAL
opt.max_docs_wal_flush = 100; // flush after every 100 docs
wal->open(opt);
// Append a few dummy payloads
for (int i = 0; i < 5; ++i) {
std::string payload = "doc-" + std::to_string(i);
wal->append(payload); // atomic, CRC‑protected
}
wal->flush(); // force OS to persist data
wal->close(); // safe to delete later
}
High-Level Document Insertion
#include "zvec/db/collection.h"
void insert_demo(zvec::Collection &coll) {
// Build a document – the same object that will be serialized
zvec::Doc doc;
doc.set_pk("user_123");
doc.set("age", int32_t(27));
doc.set("vector", std::vector<float>{0.1f, 0.2f, 0.3f});
// The Insert call will:
// 1️⃣ Append the serialized Doc to the WAL
// 2️⃣ Persist it in the forward store and scalar/vector indexes
// 3️⃣ Flush according to the WAL's `max_docs_wal_flush` policy
auto status = coll.Insert(doc);
if (!status.ok()) {
LOG_ERROR("Insert failed: %s", status.message().c_str());
}
}
Simulating Crash Recovery
// 1️⃣ Write some docs
insert_demo(collection); // WAL is appended and optionally flushed
// 2️⃣ Simulate a crash → process terminates, no explicit close
// (the OS may have flushed some data, the rest stays in the file buffer)
// 3️⃣ On next start, open the collection again:
// – SegmentImpl::open_wal_file() re‑opens the same *.wal file
// – SegmentImpl::recover() reads the WAL and re‑applies all records
// The collection will contain exactly the documents that were successfully
// appended before the crash, no partially written record will be applied
// because the CRC validation fails and `next()` returns an empty string.
Key Source Files
| File | Role | GitHub Link |
|---|---|---|
src/db/index/storage/wal/wal_file.h |
Abstract WAL interface (WalFile, WalOptions) |
wal_file.h |
src/db/index/storage/wal/wal_file.cc |
Factory creating a LocalWalFile and high-level helpers |
wal_file.cc |
src/db/index/storage/wal/local_wal_file.h |
Concrete on-disk WAL implementation (LocalWalFile) |
local_wal_file.h |
src/db/index/storage/wal/local_wal_file.cc |
Record formatting, CRC, thread-safe append, read-back, flush, delete | local_wal_file.cc |
src/db/index/segment/segment.cc |
Segment core – calls append_wal, opens WAL, recovers from it |
segment.cc |
src/db/common/file_helper.h |
Helper for constructing WAL file paths (MakeWalPath) and other storage files |
file_helper.h |
tests/db/index/storage/wal_file_test.cc |
Unit tests that verify durability under various flush policies and multithreaded writes | wal_file_test.cc |
Summary
- Append-only design: zvec's WAL in
LocalWalFilenever overwrites existing data, ensuring a linear history of mutations that can be replayed deterministically after crashes. - CRC32C protection: Every record carries a checksum computed via
ailego::Crc32c::Hash, allowing the recovery process to detect corruption and halt replay before applying damaged data. - Mutex-serialized writes: Concurrent appends are protected by
std::lock_guard<std::mutex>inLocalWalFile::append, preventing record interleaving and ensuring atomicity. - Configurable flush policies: The
max_docs_wal_flushoption inWalOptionscontrols how oftenfile_.flush()forces OS buffers to disk, balancing durability guarantees against write throughput. - Automatic recovery: On segment startup,
SegmentImpl::recoverreplays the WAL sequentially, reconstructing the exact pre-crash state by re-applying every valid record to in-memory structures.
Frequently Asked Questions
How does zvec prevent data loss if the system crashes during a write?
zvec prevents data loss by following a strict write-ahead protocol where every mutation is serialized to the WAL in LocalWalFile::append before modifying in-memory indexes. If a crash occurs, the recovery process in SegmentImpl::recover replays all valid WAL records on the next startup, restoring the exact state that existed before the failure. Corrupted or partial records are detected via CRC32C checksums and excluded from replay.
What is the role of CRC32C checksums in zvec's WAL?
CRC32C checksums serve as integrity guards against bit-rot and partial writes. When LocalWalFile::append writes a record, it computes a checksum using ailego::Crc32c::Hash and stores it alongside the payload length. During recovery, wal_file_->next() validates this CRC before returning the record; if validation fails, the replay loop terminates, ensuring that only verified data is applied to the segment.
How does zvec balance durability and performance in WAL flushing?
zvec exposes a tunable max_docs_wal_flush parameter in WalOptions that controls the flush frequency. After every N documents appended (where N equals max_docs_wal_flush), LocalWalFile::append automatically invokes file_.flush(), forcing the OS to persist buffered data to the storage device. Users requiring synchronous durability can set this to 1 for immediate flushing, while high-throughput scenarios can increase the threshold to amortize fsync costs.
Can multiple threads write to the same WAL concurrently?
Yes, multiple threads can safely append to the same WAL file concurrently. LocalWalFile::append protects the critical section with a std::lock_guard<std::mutex>, ensuring that record writes are serialized and that the length-prefix + CRC + payload structure remains contiguous and uncorrupted by concurrent access. This mutex-based serialization guarantees that the WAL remains append-only and thread-safe without requiring external synchronization from callers.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →