Creating and Optimizing Vector Indexes in Alibaba ZVec: A Complete Guide
Alibaba ZVec creates optimized vector indexes through a three-stage pipeline—configuration via IndexParam, building via local_builder.cc and IndexStreamer, and querying via IndexFactory—supporting Flat, HNSW, and IVF structures with quantization and multi-threading optimizations.
ZVec is Alibaba's high-performance vector database designed for billion-scale approximate nearest neighbor (ANN) search. Whether you are building a recommendation engine or a semantic search pipeline, creating and optimizing vector indexes in Alibaba ZVec requires understanding its C++ core architecture and the flexible YAML-based configuration system.
Understanding the ZVec Index Architecture
ZVec implements a modular index architecture defined in src/include/zvec/core/interface/index_param.h. The system supports three primary index types enumerated in the IndexType enum (lines 56–64): kFlat for exact brute-force search, kHNSW for high-recall approximate search, and kIVF for inverted file indexes suitable for billion-scale datasets.
Core Components and File Paths
The index lifecycle is managed by three core components:
IndexParam(src/include/zvec/core/interface/index_param.h): Defines index type, metric, quantizer, and runtime options.IndexFactory(src/core/interface/index_factory.cc): Instantiates concrete index classes viaIndexFactory::CreateAndInitIndex(lines 41–47).IndexStreamer(src/include/zvec/core/framework/index_streamer.h): Abstract runner that streams vectors into the index and handles dumping to storage viaopen,flush, andclosemethods.
Creating Vector Indexes in Alibaba ZVec
Index creation follows a three-stage pipeline: configuration, building, and querying.
Stage 1: Configuration with IndexParam
Index creation begins with a YAML or JSON configuration file that populates the IndexParam structure. Key parameters include:
BuilderClass: Specifies the streamer implementation (HnswStreamer,FlatStreamer, orIvfStreamer).MetricName: Distance metric (L2sq,Cosine,InnerProduct).ConverterName: Quantization method (Int8,PQ,OPQ).ThreadCount: Builder parallelism (defaults to hardware concurrency).NeedTrain: Boolean flag indicating whether a quantizer requires a separate training phase usingTrainerIndexPath.
Stage 2: Building the Index with Local Builder
The C++ entry point tools/core/local_builder.cc orchestrates the build process. The core logic creates the index via the factory, initializes storage, and streams vectors through a multi-threaded pipeline:
// Parse YAML into ailego::Params
bool prepare_params(YAML::Node &&config_params, ailego::Params ¶ms);
// Create and initialize index via factory
Index::Pointer index = core_interface::IndexFactory::CreateAndInitIndex(param);
// Initialize MMapFileStorage for zero-copy reads
auto storage = IndexFactory::CreateStorage("MMapFileStorage");
storage->open(dump_path, true);
auto dumper = IndexFactory::CreateDumper(storage);
// Create and configure streamer
IndexStreamer::Pointer streamer = IndexFactory::CreateStreamer("HnswStreamer");
streamer->init(meta, builder_params);
streamer->open(storage);
// Execute multi-threaded build loop
do_build_sparse_by_streamer(streamer, thread_count);
streamer->flush(check_point);
streamer->close();
The do_build_sparse_by_streamer function (lines 252–260) distributes vector IDs across a thread pool, optionally applies a reformer, and feeds each vector to streamer->add_impl.
Stage 3: Querying with IndexStreamer
Once built, the index is loaded via IndexStreamer::open and queried using parameters serialized through IndexFactory::QueryParamSerializeToJson (lines 141–150). The Python wrapper in python/zvec/zvec.py provides high-level access to this functionality.
Optimizing Vector Indexes in Alibaba ZVec
Optimization in ZVec targets memory efficiency, build speed, and query latency through five primary mechanisms.
Quantization Strategies
Set ConverterName in the builder config to enable quantization. The QuantizerParam struct in src/include/zvec/core/interface/index_param.h (lines 86–124) supports:
Int8: 8-bit integer quantization reducing memory footprint by 75%.PQ: Product Quantization for high compression ratios.OPQ: Optimized Product Quantization with rotation preprocessing.
When NeedTrain is true, the builder executes a training phase using TrainerIndexPath to learn quantization codebooks before the main build.
Parallelism and Threading
ZVec leverages ailego::ThreadPool (src/include/zvec/ailego/parallel/thread_pool.h) to parallelize vector ingestion. The ThreadCount parameter in YAML controls parallelism, with linear speed-up observed until memory bandwidth saturates. The do_build_sparse_by_streamer function automatically partitions work across the pool.
Reformers and Transformations
A reformer preprocesses vectors before indexing. Configure via BuilderCommon.ReformerName in YAML. The reformer is instantiated via IndexFactory::CreateReformer(meta.reformer_name()) (local_builder.cc lines 38–44). Common options include PCA for dimensionality reduction and OPQ for rotation optimization.
Storage Layout Optimization
ZVec uses MMapFileStorage by default for zero-copy reads during querying. For ultra-low latency scenarios, enable kBufferPool in StorageOptions (index_param.h lines 37–44) to keep hot index pages in memory rather than mapped files.
Search-Time Tuning
For HNSW indexes, the ef_search parameter in HNSWQueryParam controls the recall/latency trade-off. This is serialized via IndexFactory::QueryParamSerializeToJson and exposed in Python as the ef_search argument to collection.search(). Higher values improve recall at the cost of increased query latency.
Working with ZVec in Python
The Python binding in python/zvec/zvec.py mirrors the C++ pipeline:
import zvec
from zvec.model import Collection, CollectionSchema, FieldSchema, DataType
# Initialize the engine
zvec.init(log_type=zvec.LogType.CONSOLE, log_level=zvec.LogLevel.INFO)
# Create collection with schema
schema = CollectionSchema(
name="my_vectors",
fields=[
FieldSchema("id", DataType.INT64, nullable=False),
FieldSchema("vec", DataType.FLOAT_VECTOR, dimension=128)
],
)
collection = zvec.create_and_open("./my_collection", schema)
# Insert vectors
ids = [1, 2, 3]
vectors = [[0.1]*128, [0.2]*128, [0.3]*128]
collection.insert(ids, {"vec": vectors})
# Rebuild with custom HNSW parameters
collection.rebuild_index(
index_type=zvec.IndexType.HNSW,
metric=zvec.MetricType.COSINE,
ef_construction=200,
nlist=4096,
quantizer="Int8",
)
# Search with tuned parameters
results = collection.search(
vectors=[[0.15]*128],
topk=5,
ef_search=100,
metric=zvec.MetricType.COSINE,
)
print(results)
End-to-End Example: Building and Querying an HNSW Index
Step 1: Prepare the build configuration
Create build.yaml:
BuilderCommon:
BuilderClass: HnswStreamer
BuildFile: ./data/vecs/train.vecs
IndexPath: ./data/vecs/train.index
DumpPath: ./data/vecs/train.dump.index
ConverterName: Int8
MetricName: Cosine
ThreadCount: 8
NeedTrain: true
TrainFile: ./data/vecs/train.vecs
TrainerIndexPath: ./data/vecs/train.trainer.index
BuilderParams:
proxima.hnsw.builder.thread_count: !!int 8
proxima.hnsw.builder.ef_construction: !!int 200
Step 2: Execute the C++ builder
./build/bin/local_build_original build.yaml
Step 3: Query from Python
import zvec
zvec.init()
coll = zvec.open("./data/vecs/train.index")
hits = coll.search([[0.01]*128], topk=10, ef_search=150, metric=zvec.MetricType.COSINE)
print(hits)
Summary
- ZVec supports Flat, HNSW, and IVF index types through the
IndexFactory::CreateAndInitIndexmethod insrc/core/interface/index_factory.cc. - Index creation follows a three-stage pipeline: Configuration (
IndexParam), Building (local_builder.ccandIndexStreamer), and Querying (IndexStreamer::open). - Optimization strategies include quantization (
Int8,PQ,OPQ), multi-threading viaailego::ThreadPool, reformers for vector transformation, and storage layout tuning (MMapFileStoragevskBufferPool). - The Python API in
python/zvec/zvec.pyexposesrebuild_indexandsearchmethods with parameters likeef_constructionandef_searchfor fine-grained control.
Frequently Asked Questions
What index types does Alibaba ZVec support?
ZVec supports three primary index types defined in the IndexType enum within src/include/zvec/core/interface/index_param.h (lines 56–64): Flat (kFlat) for exact brute-force search, HNSW (kHNSW) for graph-based approximate nearest neighbor search with sub-millisecond latency, and IVF (kIVF) for inverted file indexes optimized for billion-scale datasets. The IndexFactory::CreateAndInitIndex method in src/core/interface/index_factory.cc instantiates the appropriate implementation class based on the configuration.
How do I enable quantization when creating a vector index in ZVec?
Enable quantization by setting the ConverterName field in your YAML configuration to Int8, PQ, or OPQ. This parameter maps to the QuantizerParam struct defined in src/include/zvec/core/interface/index_param.h (lines 86–124). When NeedTrain is set to true, the builder executes a training phase using TrainerIndexPath to learn quantization codebooks before the main build process begins in local_builder.cc.
What is the difference between ef_construction and ef_search in ZVec HNSW indexes?
ef_construction controls the quality of the HNSW graph during the build phase, specified in BuilderParams within your YAML configuration (e.g., proxima.hnsw.builder.ef_construction: 200). Higher values create denser graphs with better recall but slower build times. Conversely, ef_search is a query-time parameter in HNSWQueryParam that determines the size of the dynamic candidate list during search; it is serialized via IndexFactory::QueryParamSerializeToJson (lines 141–150) and exposed in Python as the ef_search argument to collection.search(), allowing per-query tuning of the recall-latency trade-off.
How does ZVec handle multi-threading during index construction?
ZVec leverages ailego::ThreadPool defined in src/include/zvec/ailego/parallel/thread_pool.h to parallelize vector ingestion. The do_build_sparse_by_streamer function in tools/core/local_builder.cc (lines 252–260) distributes vector IDs across the thread pool, where each thread optionally applies a reformer and feeds vectors to streamer->add_impl. The ThreadCount parameter in the YAML configuration controls the degree of parallelism, with linear speed-up typically observed until memory bandwidth becomes the bottleneck.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →