# How ZVec Integrates with Proxima for Index Creation: A Deep Dive into the VectorColumnIndexer Architecture

> Discover how ZVec integrates with Proxima for index creation. Learn about the VectorColumnIndexer architecture, schema conversion, and full index lifecycle management.

- Repository: [Alibaba/zvec](https://github.com/alibaba/zvec)
- Tags: deep-dive
- Published: 2026-02-16

---

**ZVec delegates all heavyweight vector indexing operations to Alibaba's Proxima engine through the `VectorColumnIndexer` class, which converts ZVec schema definitions into Proxima-specific parameters and manages the full index lifecycle from creation to query execution.**

ZVec is Alibaba's high-performance vector database that leverages the Proxima engine for core similarity search capabilities. Understanding how ZVec integrates with Proxima for index creation reveals the architectural bridge between ZVec's user-facing APIs and Proxima's underlying C++ indexing infrastructure. This integration allows ZVec to support multiple index types—including HNSW, IVF, and FLAT—while maintaining a unified interface for developers.

## Understanding the ZVec-Proxima Integration Architecture

### The VectorColumnIndexer Wrapper Class

At the heart of the integration lies the `VectorColumnIndexer` class, declared in [`src/db/index/column/vector_column/vector_column_indexer.h`](https://github.com/alibaba/zvec/blob/main/src/db/index/column/vector_column/vector_column_indexer.h) and implemented in `src/db/index/column/vector_column/vector_column_indexer.cc`. This class serves as the primary abstraction layer that encapsulates Proxima-specific logic while exposing a generic interface to ZVec's storage engine.

The constructor signature reveals the dependency injection pattern:

```cpp
VectorColumnIndexer(const std::string &index_file_path,
                    const FieldSchema &field_schema,
                    const std::string &engine_name = "proxima");

```

### Schema-to-Engine Translation Layer

ZVec maintains its own schema definitions in `src/db/proto/zvec.proto`, where vector fields specify index types through `FlatIndexParams`, `HnswIndexParams`, or `IVFIndexParams`. The translation between these ZVec-specific types and Proxima's native parameters occurs in [`src/db/index/column/vector_column/engine_helper.hpp`](https://github.com/alibaba/zvec/blob/main/src/db/index/column/vector_column/engine_helper.hpp) via the `ProximaEngineHelper::convert_to_engine_index_param()` method.

## Step-by-Step Index Creation Flow in ZVec

### 1. Schema Definition and FieldSchema Conversion

When defining a collection, each vector field receives a `FieldSchema` containing dimension, data type, and index parameters. The supported index types are enumerated in the protobuf definitions, with comments at lines 53-57 of `src/db/proto/zvec.proto` indicating Proxima backend support for FLAT, HNSW, and IVF indexes.

### 2. Initializing the VectorColumnIndexer

Upon collection creation, ZVec instantiates `VectorColumnIndexer` with the target file path and field schema. The constructor stores the schema and determines whether the vector representation is sparse, setting internal flags for subsequent Proxima operations.

### 3. Parameter Conversion via ProximaEngineHelper

The `Open()` method checks the `engine_name_` (defaulting to "proxima") and forwards to `CreateProximaIndex()`. This method invokes `ProximaEngineHelper::convert_to_engine_index_param()` to build a Proxima `BaseIndexParam` through several critical mappings:

- **Metric Types**: Converts ZVec `MetricType::L2` to Proxima `core_interface::MetricType::kL2sq`, with similar translations for Inner Product and Cosine similarity.
- **Quantization**: Maps ZVec `QuantizeType::FP16` to Proxima `core_interface::QuantizerType::kFP16`.
- **Index-Specific Parameters**: Extracts HNSW-specific values like `M` and `ef_construction` from `HnswIndexParams` and injects them into the Proxima parameter structure.

### 4. Creating and Opening the Proxima Index

With converted parameters, ZVec calls the Proxima factory:

```cpp
index = core_interface::IndexFactory::CreateAndInitIndex(*index_param);

```

The resulting `core_interface::Index::Pointer` is stored in `VectorColumnIndexer::index`. ZVec then opens the physical index file using the storage mode specified in `ReadOptions` (MMAP or buffer pool):

```cpp
index->Open(this->index_file_path(),
            {storage_type, read_options.create_new, read_options.read_only});

```

This creates or loads the on-disk Proxima index file at the specified path.

### 5. Insert and Search Operations

Once opened, the index handles insertions and queries by forwarding to Proxima after converting vectors and query parameters via `convert_to_engine_vector` and `convert_to_engine_query_param`. The results are wrapped back into ZVec types (`VectorIndexResults`) for return to the client.

## Code Examples: Using ZVec with Proxima

### Python API: High-Level Index Creation

The Python interface abstracts the entire Proxima integration. When you call `zvec.create_and_open()`, ZVec internally constructs the `VectorColumnIndexer` and initializes the Proxima backend:

```python
import zvec

# Define collection schema with HNSW index backed by Proxima

schema = zvec.CollectionSchema(
    name="demo",
    vectors=zvec.VectorSchema(
        name="embedding",
        dtype=zvec.DataType.VECTOR_FP32,
        dim=128,
        index=zvec.HnswIndexParams(m=16, ef_construction=200)
    ),
)

# Create the collection (under the hood VectorColumnIndexer is built)

coll = zvec.create_and_open(path="./demo_db", schema=schema)

# Insert vectors

coll.insert([
    zvec.Doc(id="doc1", vectors={"embedding": [0.1]*128}),
    zvec.Doc(id="doc2", vectors={"embedding": [0.2]*128}),
])

# Search – translated to a Proxima HNSW query

results = coll.query(
    zvec.VectorQuery("embedding", vector=[0.15]*128),
    topk=5
)
print(results)

```

### C++ Integration: Manual Indexer Construction

For direct C++ usage, you can instantiate `VectorColumnIndexer` manually to control the Proxima integration:

```cpp
#include "vector_column_indexer.h"
#include "vector_column_params.h"
#include "zvec/db/schema.h"

int main() {
  // Build FieldSchema with HNSW parameters
  zvec::FieldSchema field_schema;
  field_schema.set_name("embedding");
  field_schema.set_data_type(zvec::DataType::VECTOR_FP32);
  field_schema.set_dimension(128);
  field_schema.set_is_sparse(false);
  auto hnsw_params = std::make_shared<zvec::HnswIndexParams>();
  hnsw_params->set_m(16);
  hnsw_params->set_ef_construction(200);
  field_schema.set_index_params(hnsw_params);

  // Create the indexer (engine_name defaults to "proxima")
  auto indexer = std::make_shared<zvec::VectorColumnIndexer>(
      "./hnsw.idx", field_schema);

  // Open → Proxima index creation
  zvec::vector_column_params::ReadOptions ro{};
  ro.use_mmap = true;
  ro.create_new = true;
  indexer->Open(ro);

  // Insert a vector
  zvec::vector_column_params::VectorData vec{
      zvec::vector_column_params::DenseVector{{0.1f, 0.2f, /*...*/}}
  };
  indexer->Insert(vec, 1);

  // Search
  zvec::vector_column_params::QueryParams qp;
  qp.topk = 5;
  auto res = indexer->Search(vec, qp);
}

```

## Key Source Files and Implementation Details

The integration between ZVec and Proxima is implemented across several critical source files:

- **[`src/db/index/column/vector_column/vector_column_indexer.h`](https://github.com/alibaba/zvec/blob/main/src/db/index/column/vector_column/vector_column_indexer.h)** – Declares the `VectorColumnIndexer` class that wraps Proxima functionality and provides the interface used by ZVec's storage engine.

- **`src/db/index/column/vector_column/vector_column_indexer.cc`** – Implements the core lifecycle methods including `Open()`, `CreateProximaIndex()`, `Insert()`, and `Search()`, handling the delegation to Proxima's C++ API.

- **[`src/db/index/column/vector_column/engine_helper.hpp`](https://github.com/alibaba/zvec/blob/main/src/db/index/column/vector_column/engine_helper.hpp)** – Contains `ProximaEngineHelper` with conversion utilities like `convert_to_engine_index_param()` to map ZVec `FieldSchema` objects to Proxima `BaseIndexParam` structures, `convert_to_engine_vector()` for data format conversion, and `convert_to_engine_query_param()` for query translation.

- **`src/db/proto/zvec.proto`** – Defines the protocol buffer schemas including `IndexType` enum (FLAT, HNSW, IVF) and index parameter structures that determine which Proxima index implementation is instantiated, with specific comments at lines 53-57 documenting Proxima support.

- **[`README.md`](https://github.com/alibaba/zvec/blob/main/README.md)** – Provides high-level documentation describing ZVec's architecture and its dependency on Proxima for vector similarity search.

These files together illustrate the full integration path from user-level schema definitions to Proxima index creation and query execution.

## Summary

- **ZVec delegates vector indexing to Proxima** through the `VectorColumnIndexer` wrapper class, abstracting the complexity of Proxima's C++ API while exposing a unified interface to ZVec's storage layer.
- **Type conversion is handled by `ProximaEngineHelper`** in [`engine_helper.hpp`](https://github.com/alibaba/zvec/blob/main/engine_helper.hpp), mapping ZVec schema parameters to Proxima's `BaseIndexParam`, metric types, and quantization settings.
- **Index creation follows a strict lifecycle**: schema definition → indexer construction → parameter conversion → `IndexFactory::CreateAndInitIndex()` → physical file opening via `index->Open()`.
- **Supported index types** include FLAT, HNSW, and IVF, declared in the protobuf schema and backed by corresponding Proxima implementations with full support for metric type customization and quantization.
- **Both Python and C++ APIs** ultimately trigger the same Proxima integration layer, ensuring consistent indexing behavior and performance characteristics across language bindings.

## Frequently Asked Questions

### What index types does ZVec support through Proxima?

ZVec supports **FLAT (brute-force), HNSW (hierarchical navigable small world), and IVF (inverted file)** index types through its Proxima integration. These are defined in the `IndexType` enum within `src/db/proto/zvec.proto` at lines 53-57, with each type mapping to a corresponding Proxima engine implementation via the `ProximaEngineHelper` conversion layer.

### How does ZVec handle metric type conversion for Proxima indexes?

ZVec handles metric type conversion through the `ProximaEngineHelper::convert_to_engine_index_param()` method in [`engine_helper.hpp`](https://github.com/alibaba/zvec/blob/main/engine_helper.hpp). This utility maps ZVec's internal `MetricType` enumerations—such as `MetricType::L2`, `MetricType::IP`, and `MetricType::Cosine`—to Proxima's `core_interface::MetricType` equivalents including `kL2sq`, ensuring the underlying Proxima index uses the correct distance calculation algorithm.

### Can ZVec use vector search engines other than Proxima?

While the `VectorColumnIndexer` constructor accepts an `engine_name` parameter that defaults to `"proxima"`, the current implementation in `vector_column_indexer.cc` specifically checks for this engine name and forwards to `CreateProximaIndex()`. The architecture theoretically supports pluggable engines through this abstraction, but the codebase currently implements full index creation and query functionality only for the Proxima backend.

### What is the role of engine_helper.hpp in the ZVec-Proxima integration?

The [`engine_helper.hpp`](https://github.com/alibaba/zvec/blob/main/engine_helper.hpp) file contains the `ProximaEngineHelper` class, which serves as the **type translation bridge** between ZVec and Proxima. It provides static conversion utilities including `convert_to_engine_index_param()` for mapping schema parameters, `convert_to_engine_vector()` for data format translation, and `convert_to_engine_query_param()` for query translation, ensuring type-safe interoperability between ZVec's domain objects and Proxima's C++ API.