# Zvec Storage Backends: A Complete Guide to Index Storage Options in Alibaba Zvec

> Explore Zvec storage backends: memory-mapped files, pure memory, buffer pools, file-based I/O, and read-only MMAP. Choose the best option for your latency and deployment needs.

- Repository: [Alibaba/zvec](https://github.com/alibaba/zvec)
- Tags: deep-dive
- Published: 2026-02-16

---

**Zvec provides five distinct storage backends—memory-mapped files, pure memory, buffer pools, file-based I/O, and read-only MMAP wrappers—each optimized for specific latency, mutability, and deployment constraints.**

The Alibaba zvec library abstracts physical index location through the `IndexStorage` interface, allowing developers to choose from multiple zvec storage backends based on workload requirements. Whether you need low-latency production serving or fast in-memory testing, selecting the right backend is critical for performance.

## Understanding the IndexStorage Interface

At the core of zvec's storage layer is the `IndexStorage` interface, which decouples index logic from physical storage mechanics. Concrete implementations handle the actual data placement, while the index operates on a unified API. Runtime selection occurs through `IndexFactory::CreateStorage`, which instantiates backends by name, or via the `StorageOptions::StorageType` enum defined in [`include/zvec/core/interface/index_param.h`](https://github.com/alibaba/zvec/blob/main/include/zvec/core/interface/index_param.h).

## Available Zvec Storage Backends

Zvec organizes storage into three primary enum-driven types (`kMMAP`, `kMemory`, `kBufferPool`) and two additional name-registered implementations (`FileReadStorage`, `MMapFileReadStorage`). The `StorageOptions` struct controls mutability through its `is_writeable` boolean flag, which determines whether the factory instantiates a read-write or read-only variant.

## Detailed Backend Comparison

### Memory-Mapped File Storage (kMMAP)

**Concrete classes:** `MMapFileStorage` (read/write) and `MMapFileReadStorage` (read-only)  
**Implementation:** `src/core/utility/mmap_file_storage.cc` and `src/core/utility/mmap_file_read_storage.cc`

This backend utilizes `ailego::MMapFile` to map index files directly into the process address space. It supports copy-on-write semantics, page-locking, warm-up routines, and optional direct-IO bypassing the page cache.

**Use when:** Serving large, immutable or append-only indexes in production where low-latency reads are critical and data copying into user-space must be minimized.

### In-Memory Storage (kMemory)

**Concrete class:** `MemoryReadStorage`  
**Implementation:** `src/core/utility/memory_read_storage.cc`

The index is fully materialized in a RAM-resident `IndexMemory` rope structure. After initial loading, no file I/O occurs during access, providing the lowest possible latency.

**Use when:** Working with small-to-medium indexes that fit comfortably in RAM, or when running unit tests and benchmarks that require the fastest possible in-process access without filesystem overhead.

### Buffer Pool Storage (kBufferPool)

**Concrete class:** `BufferStorage`  
**Implementation:** `src/core/utility/buffer_storage.cc`

This backend routes reads through the global `ailego::BufferManager`. Index segments are served from a shared buffer pool rather than direct file mappings, enabling cross-process data sharing.

**Use when:** Deploying distributed inference workers where multiple processes must share the same storage via a buffer manager, or when avoiding `mmap` overhead while keeping data off-heap.

### File-Based Storage (FileReadStorage)

**Concrete class:** `FileReadStorage`  
**Implementation:** `src/core/utility/file_read_storage.cc`

Selected by name rather than enum, this implementation performs pure file-based I/O with optional direct-IO, page-locking, and memory warm-up. It can fall back to internal `mmap` for individual segment reads when beneficial.

**Use when:** Reading indexes from regular filesystems that do not support `mmap`, or when explicit read calls are preferred over memory mapping for read-only workloads.

### Read-Only MMAP Wrapper (MMapFileReadStorage)

**Concrete class:** `MMapFileReadStorage`  
**Implementation:** `src/core/utility/mmap_file_read_storage.cc`

A minimal wrapper selected explicitly by name, providing a lightweight read-only interface around `MMapFile` without buffer manager integration or additional caching layers.

**Use when:** Inspecting indexes without modification is required, and the file is already memory-mapped, ensuring the lowest memory footprint for read-only tools.

## How to Select and Configure Storage in Code

All backends register with the factory using the `INDEX_FACTORY_REGISTER_STORAGE` macro defined in [`include/zvec/core/framework/index_factory.h`](https://github.com/alibaba/zvec/blob/main/include/zvec/core/framework/index_factory.h). At runtime, instantiate storage by enum via `StorageOptions` or by name using factory helpers:

```cpp
// Create by enum (recommended for most use cases)
zvec::core::StorageOptions opt{
    zvec::core::StorageOptions::StorageType::kMMAP, 
    true  // is_writeable = true for MMapFileStorage, false for MMapFileReadStorage
};
index->Open("my_index", opt);

// Create by name for specialized implementations
auto storage = zvec::core::IndexFactory::CreateStorage("FileReadStorage");

// Runtime introspection
bool available = zvec::core::IndexFactory::HasStorage("BufferStorage");
auto all_backends = zvec::core::IndexFactory::AllStorages();

```

The `is_writeable` flag determines whether the factory instantiates mutable implementations (`MMapFileStorage`) or their read-only counterparts (`MMapFileReadStorage`, `MemoryReadStorage`).

## Summary

- **kMMAP** (`MMapFileStorage`): Best for large, production-grade indexes requiring low-latency reads through memory-mapped files.
- **kMemory** (`MemoryReadStorage`): Ideal for small indexes and testing scenarios demanding maximum in-process speed.
- **kBufferPool** (`BufferStorage`): Suited for distributed environments sharing storage via `ailego::BufferManager`.
- **FileReadStorage**: Use when explicit file I/O is required or the underlying filesystem lacks mmap support.
- **MMapFileReadStorage**: Lightweight read-only wrapper for inspection tools needing minimal memory overhead.

## Frequently Asked Questions

### When should I choose kMMAP over kMemory for my zvec index?

Choose **kMMAP** when your index is too large to fit comfortably in RAM or when you need persistent memory mapping for production serving with minimal copy overhead. Use **kMemory** only when the entire index fits in memory and you require the absolute lowest latency for testing or small-scale deployments.

### Can I switch between storage backends after creating a zvec index?

No, the storage backend is determined at index open time via `StorageOptions` or `IndexFactory::CreateStorage`. While you can reopen the same index file with a different backend (assuming the format supports it), you cannot dynamically switch backends on an already opened index instance.

### What is the difference between FileReadStorage and MMapFileStorage?

**FileReadStorage** performs explicit read system calls with optional direct-IO and can fall back to temporary mmap for segments, making it suitable for filesystems that do not support memory mapping. **MMapFileStorage** relies primarily on `ailego::MMapFile` for zero-copy access and supports copy-on-write semantics, making it faster for random access but dependent on mmap support.

### How do I check which zvec storage backends are available in my build?

Call `zvec::core::IndexFactory::AllStorages()` to retrieve a list of all registered storage names, or use `zvec::core::IndexFactory::HasStorage("StorageName")` to check for a specific implementation. All backends register automatically via the `INDEX_FACTORY_REGISTER_STORAGE` macro when the library is linked.