Managing Document Versioning and Deletion with Alibaba Zvec's Write-Ahead Log

Alibaba Zvec uses a Write-Ahead Log (WAL) to guarantee durability of writes and coordinate document versioning by first persisting operations to a temporary WAL file before atomically merging them into segment storage during flush operations.

Managing document versioning and deletion with alibaba/zvec's write-ahead log is essential for building durable vector search applications. The Zvec engine employs a sophisticated WAL architecture that isolates the cost of durability from heavy-weight indexing processes while enabling fast crash recovery and versioned deletion tracking. This article examines the core components, lifecycle stages, and implementation details of Zvec's WAL system based on the actual source code in the alibaba/zvec repository.

Understanding Zvec's Write-Ahead Log Architecture

The WAL implementation in Zvec centers on an abstract interface with concrete local file backing, providing a clear separation between storage logic and the physical medium.

Core WAL Components

The architecture defines three primary classes in src/db/index/storage/wal/wal_file.h:

  • WalFile – The abstract base class defining the interface for create, open, append, read, flush, and remove operations.
  • LocalWalFile – The concrete implementation in src/db/index/storage/wal/local_wal_file.h and local_wal_file.cc that writes to local filesystem files, computes CRC32C checksums for each record, and supports batch flushing.
  • WalFile::Create / CreateAndOpen – Factory helpers defined in src/db/index/storage/wal/wal_file.cc used by the segment layer to instantiate WAL instances.

File Naming Conventions

Zvec generates WAL file paths using FileHelper::MakeFilePath from src/db/common/file_helper.h. The naming convention follows the pattern:

${prefix_path}/data.wal.${segment_id}

This helper also generates paths for delete snapshots using the DELETE_FILE enum, ensuring consistent file organization across the storage layer.

Document Versioning and Deletion Workflow

Zvec coordinates document deletions through a versioned bitmap system that integrates tightly with the WAL lifecycle.

The DeleteStore Bitmap

Deleted document IDs are tracked in memory using a roaring bitmap implemented in src/db/index/common/delete_store.h. When a deletion occurs:

  1. The document ID is added to the in-memory bitmap.
  2. The deletion is appended to the WAL for durability.
  3. The DeleteStore marks itself as modified since the last flush.

VersionManager and Snapshot Suffixes

The VersionManager in src/db/index/segment/segment.cc tracks the current delete-snapshot suffix. During SegmentImpl::flush:

  • If delete_store_->modified_since_last_flush() returns true, Zvec writes a new delete-snapshot file using FileID::DELETE_FILE with a new suffix.
  • The version manager records this suffix, ensuring that readers can locate the correct deletion state.
  • The old snapshot file is removed after the segment commit completes, maintaining only the latest deletion version.

WAL Lifecycle from Creation to Cleanup

The WAL progresses through five distinct stages that ensure atomicity and crash recovery.

1. WAL Creation and Path Generation

When a segment opens for writing, Zvec constructs the WAL path:

std::string wal_path = FileHelper::MakeFilePath("./", FileID::WAL_FILE, 0);
WalFilePtr wal = WalFile::Create(wal_path);

The FileHelper utility ensures consistent naming across the storage engine.

2. Appending Records with CRC32C Validation

Each document or deletion marker is serialized into a WalRecord containing:

  • Payload length
  • CRC32C checksum
  • Binary payload

The LocalWalFile::append method writes these records atomically, ensuring that corrupted records can be detected during recovery.

3. Batch Flushing and Durability

Zvec supports configurable auto-flush behavior through the max_docs_wal_flush option:

WalOptions opt;
opt.max_docs_wal_flush = 10;  // Flush after every 10 documents
wal->open(opt);

When the threshold is reached, LocalWalFile automatically calls file_.flush(), ensuring data reaches stable storage without requiring an explicit flush call for every document.

4. Recovery and Validation

During segment startup, if a WAL file exists from a previous crash, Zvec executes the recovery path:

wal->prepare_for_read();
std::string rec;
while (!(rec = wal->next()).empty()) {
    // Verify CRC and rebuild in-memory structures
}

The next() method iterates through records, validates CRC32C checksums, and returns valid payloads. Corrupted records trigger recovery failures, preventing partial data ingestion.

5. Commit and Cleanup

After a successful segment flush in SegmentImpl::flush, Zvec commits the WAL:

wal->remove();

This deletes the WAL file, indicating that all data has been safely merged into the segment's permanent storage. If a delete-snapshot was written, the old snapshot file is also removed via FileHelper::RemoveFile(delete_store_path), ensuring that only current version data consumes disk space.

Practical Implementation Example

Below is a complete example demonstrating the WAL API for document versioning and deletion management:

#include "db/index/storage/wal/wal_file.h"
#include "db/common/file_helper.h"

using namespace zvec;

int main() {
  // 1️⃣ Build WAL path for segment 0 in the current directory.
  std::string wal_path = FileHelper::MakeFilePath("./", FileID::WAL_FILE, 0);

  // 2️⃣ Create a WAL instance.
  WalFilePtr wal = WalFile::Create(wal_path);
  if (!wal) return -1;

  // 3️⃣ Open the WAL for a new write batch.
  WalOptions opt;
  opt.create_new = true;           // fail if file already exists
  opt.max_docs_wal_flush = 10;     // flush after each 10 docs (0 = no auto‑flush)
  if (wal->open(opt) != 0) return -1;

  // 4️⃣ Append a few records (e.g. serialized JSON docs).
  wal->append(std::string("{\"id\":1,\"text\":\"hello\"}"));
  wal->append(std::string("{\"id\":2,\"text\":\"world\"}"));
  // … more appends …

  // 5️⃣ Ensure all data is persisted.
  wal->flush();

  // 6️⃣ Close the WAL after writing.
  wal->close();

  // ---------- Recovery path ----------
  // 7️⃣ Re‑open the same WAL for read‑only access.
  WalOptions read_opt{ .create_new = false };
  if (wal->open(read_opt) != 0) return -1;
  wal->prepare_for_read();

  // 8️⃣ Iterate over records.
  std::string rec;
  while (!(rec = wal->next()).empty()) {
    // Process the recovered document.
    std::cout << "Recovered: " << rec << std::endl;
  }

  // 9️⃣ Clean up the WAL file when the segment is committed.
  wal->remove();
}

This example illustrates the complete lifecycle: path generation using FileHelper::MakeFilePath, record appending with automatic CRC32C computation, configurable auto-flush via max_docs_wal_flush, crash recovery with prepare_for_read and next(), and final cleanup with remove().

Key Source Files Reference

File Description
src/db/index/storage/wal/wal_file.h Abstract WalFile interface defining create, open, append, read, flush, and remove operations.
src/db/index/storage/wal/wal_file.cc Factory implementation for WalFile::Create and CreateAndOpen.
src/db/index/storage/wal/local_wal_file.h Declaration of LocalWalFile concrete implementation.
src/db/index/storage/wal/local_wal_file.cc Full implementation including record format, CRC32C checksums, batch flushing, and file removal.
src/db/common/file_helper.h FileHelper::MakeFilePath utility for generating WAL and delete-snapshot file names.
src/db/index/segment/segment.cc SegmentImpl::flush logic handling delete-snapshot versioning and WAL cleanup.
src/db/index/common/delete_store.h DeleteStore roaring bitmap implementation for tracking deleted document IDs.

Summary

  • Alibaba Zvec implements a robust Write-Ahead Log system in src/db/index/storage/wal/ to guarantee durability before indexing.
  • The WalFile abstract interface and LocalWalFile concrete implementation handle record appending with CRC32C validation and configurable auto-flush via max_docs_wal_flush.
  • Document deletion is managed through a versioned DeleteStore bitmap; during SegmentImpl::flush, new delete snapshots are written only when modifications exist, and old versions are cleaned up atomically.
  • Recovery iterates WAL records using prepare_for_read() and next(), validating checksums before rebuilding in-memory state.
  • Cleanup occurs after successful segment flush when wal->remove() deletes the WAL file, confirming all data is merged into permanent storage.

Frequently Asked Questions

How does Zvec ensure data durability during crashes?

Zvec ensures durability by writing all document inserts and deletion markers to a Write-Ahead Log before they are indexed. The LocalWalFile implementation computes CRC32C checksums for every record and supports automatic flushing via the max_docs_wal_flush option. If a crash occurs, the segment recovery process reads the existing WAL file, validates each record's checksum using prepare_for_read() and next(), and rebuilds the in-memory index state before accepting new writes.

What happens to the WAL file after a segment flush?

After SegmentImpl::flush successfully writes pending documents and any modified delete snapshots to permanent storage, the system calls wal_file_->remove() to delete the WAL file. This removal signals that all data has been safely merged into the segment's permanent files and that the durability guarantee is no longer needed. If a delete snapshot was written during the flush, the old snapshot file is also removed via FileHelper::RemoveFile, ensuring only the current version consumes disk space.

How does versioning work for deleted documents?

Deleted documents are tracked in a roaring bitmap managed by the DeleteStore class. When deletions occur, the bitmap is updated in memory and the operation is logged to the WAL. During SegmentImpl::flush, the system checks delete_store_->modified_since_last_flush(); if true, it generates a new snapshot suffix via the VersionManager, writes the bitmap to a new delete-snapshot file using FileHelper::MakeFilePath with FileID::DELETE_FILE, and updates the version metadata. This creates an immutable version history where readers can locate the correct deletion state for any segment version.

Can I configure the auto-flush behavior?

Yes, the auto-flush behavior is controlled through the WalOptions structure passed to wal->open(). Specifically, the max_docs_wal_flush parameter determines how many documents can be appended before LocalWalFile automatically calls file_.flush() to force data to stable storage. Setting this to 0 disables auto-flush, requiring explicit wal->flush() calls, while higher values batch more documents for better throughput at the cost of slightly higher crash recovery windows.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →