how-to-guide

Creating and Optimizing Vector Indexes in Alibaba ZVec: A Complete Guide

February 16, 2026 alibaba/zvec ↗

Alibaba ZVec creates optimized vector indexes through a three-stage pipeline—configuration via IndexParam, building via local_builder.cc and IndexStreamer, and querying via IndexFactory—supporting Flat, HNSW, and IVF structures with quantization and multi-threading optimizations.

ZVec is Alibaba's high-performance vector database designed for billion-scale approximate nearest neighbor (ANN) search. Whether you are building a recommendation engine or a semantic search pipeline, creating and optimizing vector indexes in Alibaba ZVec requires understanding its C++ core architecture and the flexible YAML-based configuration system.

Understanding the ZVec Index Architecture

ZVec implements a modular index architecture defined in src/include/zvec/core/interface/index_param.h. The system supports three primary index types enumerated in the IndexType enum (lines 56–64): kFlat for exact brute-force search, kHNSW for high-recall approximate search, and kIVF for inverted file indexes suitable for billion-scale datasets.

Core Components and File Paths

The index lifecycle is managed by three core components:

IndexParam (src/include/zvec/core/interface/index_param.h): Defines index type, metric, quantizer, and runtime options.
IndexFactory (src/core/interface/index_factory.cc): Instantiates concrete index classes via IndexFactory::CreateAndInitIndex (lines 41–47).
IndexStreamer (src/include/zvec/core/framework/index_streamer.h): Abstract runner that streams vectors into the index and handles dumping to storage via open, flush, and close methods.

Creating Vector Indexes in Alibaba ZVec

Index creation follows a three-stage pipeline: configuration, building, and querying.

Stage 1: Configuration with IndexParam

Index creation begins with a YAML or JSON configuration file that populates the IndexParam structure. Key parameters include:

BuilderClass: Specifies the streamer implementation (HnswStreamer, FlatStreamer, or IvfStreamer).
MetricName: Distance metric (L2sq, Cosine, InnerProduct).
ConverterName: Quantization method (Int8, PQ, OPQ).
ThreadCount: Builder parallelism (defaults to hardware concurrency).
NeedTrain: Boolean flag indicating whether a quantizer requires a separate training phase using TrainerIndexPath.

Stage 2: Building the Index with Local Builder

The C++ entry point tools/core/local_builder.cc orchestrates the build process. The core logic creates the index via the factory, initializes storage, and streams vectors through a multi-threaded pipeline:

// Parse YAML into ailego::Params
bool prepare_params(YAML::Node &&config_params, ailego::Params &params);

// Create and initialize index via factory
Index::Pointer index = core_interface::IndexFactory::CreateAndInitIndex(param);

// Initialize MMapFileStorage for zero-copy reads
auto storage = IndexFactory::CreateStorage("MMapFileStorage");
storage->open(dump_path, true);
auto dumper = IndexFactory::CreateDumper(storage);

// Create and configure streamer
IndexStreamer::Pointer streamer = IndexFactory::CreateStreamer("HnswStreamer");
streamer->init(meta, builder_params);
streamer->open(storage);

// Execute multi-threaded build loop
do_build_sparse_by_streamer(streamer, thread_count);
streamer->flush(check_point);
streamer->close();

The do_build_sparse_by_streamer function (lines 252–260) distributes vector IDs across a thread pool, optionally applies a reformer, and feeds each vector to streamer->add_impl.

Stage 3: Querying with IndexStreamer

Once built, the index is loaded via IndexStreamer::open and queried using parameters serialized through IndexFactory::QueryParamSerializeToJson (lines 141–150). The Python wrapper in python/zvec/zvec.py provides high-level access to this functionality.

Optimizing Vector Indexes in Alibaba ZVec

Optimization in ZVec targets memory efficiency, build speed, and query latency through five primary mechanisms.

Quantization Strategies

Set ConverterName in the builder config to enable quantization. The QuantizerParam struct in src/include/zvec/core/interface/index_param.h (lines 86–124) supports:

Int8: 8-bit integer quantization reducing memory footprint by 75%.
PQ: Product Quantization for high compression ratios.
OPQ: Optimized Product Quantization with rotation preprocessing.

When NeedTrain is true, the builder executes a training phase using TrainerIndexPath to learn quantization codebooks before the main build.

Parallelism and Threading

ZVec leverages ailego::ThreadPool (src/include/zvec/ailego/parallel/thread_pool.h) to parallelize vector ingestion. The ThreadCount parameter in YAML controls parallelism, with linear speed-up observed until memory bandwidth saturates. The do_build_sparse_by_streamer function automatically partitions work across the pool.

Reformers and Transformations

A reformer preprocesses vectors before indexing. Configure via BuilderCommon.ReformerName in YAML. The reformer is instantiated via IndexFactory::CreateReformer(meta.reformer_name()) (local_builder.cc lines 38–44). Common options include PCA for dimensionality reduction and OPQ for rotation optimization.

Storage Layout Optimization

ZVec uses MMapFileStorage by default for zero-copy reads during querying. For ultra-low latency scenarios, enable kBufferPool in StorageOptions (index_param.h lines 37–44) to keep hot index pages in memory rather than mapped files.

Search-Time Tuning

For HNSW indexes, the ef_search parameter in HNSWQueryParam controls the recall/latency trade-off. This is serialized via IndexFactory::QueryParamSerializeToJson and exposed in Python as the ef_search argument to collection.search(). Higher values improve recall at the cost of increased query latency.

Working with ZVec in Python

The Python binding in python/zvec/zvec.py mirrors the C++ pipeline:

import zvec
from zvec.model import Collection, CollectionSchema, FieldSchema, DataType

# Initialize the engine

zvec.init(log_type=zvec.LogType.CONSOLE, log_level=zvec.LogLevel.INFO)

# Create collection with schema

schema = CollectionSchema(
    name="my_vectors",
    fields=[
        FieldSchema("id", DataType.INT64, nullable=False),
        FieldSchema("vec", DataType.FLOAT_VECTOR, dimension=128)
    ],
)
collection = zvec.create_and_open("./my_collection", schema)

# Insert vectors

ids = [1, 2, 3]
vectors = [[0.1]*128, [0.2]*128, [0.3]*128]
collection.insert(ids, {"vec": vectors})

# Rebuild with custom HNSW parameters

collection.rebuild_index(
    index_type=zvec.IndexType.HNSW,
    metric=zvec.MetricType.COSINE,
    ef_construction=200,
    nlist=4096,
    quantizer="Int8",
)

# Search with tuned parameters

results = collection.search(
    vectors=[[0.15]*128],
    topk=5,
    ef_search=100,
    metric=zvec.MetricType.COSINE,
)
print(results)

End-to-End Example: Building and Querying an HNSW Index

Step 1: Prepare the build configuration

Create build.yaml:

BuilderCommon:
  BuilderClass: HnswStreamer
  BuildFile: ./data/vecs/train.vecs
  IndexPath: ./data/vecs/train.index
  DumpPath: ./data/vecs/train.dump.index
  ConverterName: Int8
  MetricName: Cosine
  ThreadCount: 8
  NeedTrain: true
  TrainFile: ./data/vecs/train.vecs
  TrainerIndexPath: ./data/vecs/train.trainer.index
BuilderParams:
  proxima.hnsw.builder.thread_count: !!int 8
  proxima.hnsw.builder.ef_construction: !!int 200

Step 2: Execute the C++ builder

./build/bin/local_build_original build.yaml

Step 3: Query from Python

import zvec
zvec.init()
coll = zvec.open("./data/vecs/train.index")
hits = coll.search([[0.01]*128], topk=10, ef_search=150, metric=zvec.MetricType.COSINE)
print(hits)

Summary

ZVec supports Flat, HNSW, and IVF index types through the IndexFactory::CreateAndInitIndex method in src/core/interface/index_factory.cc.
Index creation follows a three-stage pipeline: Configuration (IndexParam), Building (local_builder.cc and IndexStreamer), and Querying (IndexStreamer::open).
Optimization strategies include quantization (Int8, PQ, OPQ), multi-threading via ailego::ThreadPool, reformers for vector transformation, and storage layout tuning (MMapFileStorage vs kBufferPool).
The Python API in python/zvec/zvec.py exposes rebuild_index and search methods with parameters like ef_construction and ef_search for fine-grained control.

Frequently Asked Questions

What index types does Alibaba ZVec support?

ZVec supports three primary index types defined in the IndexType enum within src/include/zvec/core/interface/index_param.h (lines 56–64): Flat (kFlat) for exact brute-force search, HNSW (kHNSW) for graph-based approximate nearest neighbor search with sub-millisecond latency, and IVF (kIVF) for inverted file indexes optimized for billion-scale datasets. The IndexFactory::CreateAndInitIndex method in src/core/interface/index_factory.cc instantiates the appropriate implementation class based on the configuration.

How do I enable quantization when creating a vector index in ZVec?

Enable quantization by setting the ConverterName field in your YAML configuration to Int8, PQ, or OPQ. This parameter maps to the QuantizerParam struct defined in src/include/zvec/core/interface/index_param.h (lines 86–124). When NeedTrain is set to true, the builder executes a training phase using TrainerIndexPath to learn quantization codebooks before the main build process begins in local_builder.cc.

What is the difference between ef_construction and ef_search in ZVec HNSW indexes?

ef_construction controls the quality of the HNSW graph during the build phase, specified in BuilderParams within your YAML configuration (e.g., proxima.hnsw.builder.ef_construction: 200). Higher values create denser graphs with better recall but slower build times. Conversely, ef_search is a query-time parameter in HNSWQueryParam that determines the size of the dynamic candidate list during search; it is serialized via IndexFactory::QueryParamSerializeToJson (lines 141–150) and exposed in Python as the ef_search argument to collection.search(), allowing per-query tuning of the recall-latency trade-off.

How does ZVec handle multi-threading during index construction?

ZVec leverages ailego::ThreadPool defined in src/include/zvec/ailego/parallel/thread_pool.h to parallelize vector ingestion. The do_build_sparse_by_streamer function in tools/core/local_builder.cc (lines 252–260) distributes vector IDs across the thread pool, where each thread optionally applies a reformer and feeds vectors to streamer->add_impl. The ThreadCount parameter in the YAML configuration controls the degree of parallelism, with linear speed-up typically observed until memory bandwidth becomes the bottleneck.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how alibaba/zvec works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →