# ZVec Collection Schema and Index Parameter Configurations: A Complete Guide

> Master zvec collection schema and index parameter configurations including FieldSchema VectorSchema InvertIndexParam HnswIndexParam and IVFIndexParam for optimized search.

- Repository: [Alibaba/zvec](https://github.com/alibaba/zvec)
- Tags: how-to-guide
- Published: 2026-02-16

---

**ZVec collection schemas define scalar fields via `FieldSchema` and vector fields via `VectorSchema`, while index parameters such as `InvertIndexParam`, `HnswIndexParam`, and `IVFIndexParam` control how each field is indexed for optimized search performance.**

ZVec, Alibaba’s high-performance vector database, organizes data into collections governed by structured schemas. Understanding **zvec collection schema and index parameter configurations** is essential for defining data types, enabling efficient similarity search, and optimizing query performance across both scalar and vector fields.

## Understanding ZVec Collection Schema Structure

The `CollectionSchema` class serves as the top-level container that defines the structure of a ZVec collection. According to the source code in [`python/zvec/model/schema/collection_schema.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/schema/collection_schema.py), this class holds the collection name, a list of scalar `FieldSchema` objects, and a list of `VectorSchema` objects. It automatically validates the uniqueness of field names across both scalar and vector fields to prevent naming collisions.

### FieldSchema for Scalar Fields

Scalar fields are defined using the `FieldSchema` class, located in [`python/zvec/model/schema/field_schema.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/schema/field_schema.py). Each scalar field description includes:

- **Name**: The column identifier
- **DataType**: The scalar data type (e.g., `INT64`, `STRING`)
- **Nullable**: Boolean indicating whether null values are permitted
- **Index Parameter**: Optional `InvertIndexParam` for enabling inverted index optimizations

The `InvertIndexParam` supports range queries and wildcard search on scalar columns, with options like `enable_range_optimization` and `enable_extended_wildcard`.

### VectorSchema for Vector Fields

Vector fields use the `VectorSchema` class (also in [`python/zvec/model/schema/field_schema.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/schema/field_schema.py)) to define high-dimensional embeddings. Each vector schema specifies:

- **Name**: The vector column identifier
- **DataType**: Vector data type (e.g., `VECTOR_FP32`)
- **Dimension**: The vector dimensionality (e.g., 128, 768)
- **Index Parameter**: Required vector index configuration (`FlatIndexParam`, `HnswIndexParam`, or `IVFIndexParam`)

### Building a Collection Schema in Python

Here is a complete example demonstrating how to construct a collection schema with both scalar and vector fields:

```python
from zvec import CollectionSchema, FieldSchema, VectorSchema
from zvec.typing import DataType, MetricType
from zvec.model.param import InvertIndexParam, HnswIndexParam

# Define scalar field with inverted index for range queries

doc_id_field = FieldSchema(
    name="doc_id",
    data_type=DataType.INT64,
    nullable=False,
    index_param=InvertIndexParam(enable_range_optimization=True),
)

# Define vector field with HNSW index for approximate search

embedding_field = VectorSchema(
    name="embedding",
    data_type=DataType.VECTOR_FP32,
    dimension=128,
    index_param=HnswIndexParam(m=16, ef_construction=200, metric_type=MetricType.COSINE),
)

# Assemble the collection schema

schema = CollectionSchema(
    name="document_collection",
    fields=[doc_id_field],
    vectors=[embedding_field],
)

```

The constructor automatically validates that field names are unique across both `fields` and `vectors`, and that index parameters match their respective field types.

## Index Parameter Configurations in ZVec

ZVec defines three distinct families of index parameters, implemented in C++ in [`src/include/zvec/db/index_params.h`](https://github.com/alibaba/zvec/blob/main/src/include/zvec/db/index_params.h) and exposed to Python through `zvec.model.param`. These configurations determine how data is indexed and queried.

### Scalar Index Parameters

For scalar fields, ZVec provides the `InvertIndexParam` class:

- **Purpose**: Optimizes range queries and wildcard searches on non-vector columns
- **Key Parameters**:
  - `enable_range_optimization`: Boolean to enable optimized range filtering
  - `enable_extended_wildcard`: Boolean to support advanced wildcard patterns

### Vector Index Parameters

ZVec supports three vector index strategies, each with distinct performance characteristics:

**`FlatIndexParam`**
- **Purpose**: Brute-force exact search with 100% recall
- **Best for**: Small datasets or when exact results are mandatory
- **Parameters**: `metric_type` (IP, L2, COSINE)

**`HnswIndexParam`**
- **Purpose**: Approximate nearest neighbor search using hierarchical navigable small world graphs
- **Best for**: High-recall, low-latency applications
- **Key Parameters**:
  - `m`: Number of bi-directional links for each node (typically 8-32)
  - `ef_construction`: Size of dynamic candidate list during construction (higher = better quality)
  - `metric_type`: Distance metric

**`IVFIndexParam`**
- **Purpose**: Inverted file index with optional SOAR acceleration for large-scale datasets
- **Best for**: Billion-scale vector search with memory efficiency
- **Key Parameters**:
  - `n_list`: Number of coarse centroids (typically 4*sqrt(n) for n vectors)
  - `n_iters`: Number of k-means refinement iterations
  - `use_soar`: Boolean to enable SOAR acceleration
  - `metric_type`: Distance metric

### Creating Index Parameters in Python

Here is a comprehensive example showing all index parameter types:

```python
from zvec.model.param import (
    InvertIndexParam,
    FlatIndexParam,
    HnswIndexParam,
    IVFIndexParam,
)
from zvec.typing import MetricType

# Scalar inverted index

scalar_idx = InvertIndexParam(
    enable_range_optimization=True,
    enable_extended_wildcard=False,
)

# Flat index for exact search

flat_idx = FlatIndexParam(metric_type=MetricType.IP)

# HNSW index for high-performance ANN

hnsw_idx = HnswIndexParam(
    m=16,
    ef_construction=200,
    metric_type=MetricType.COSINE,
)

# IVF index for large-scale search

ivf_idx = IVFIndexParam(
    metric_type=MetricType.L2,
    n_list=1024,
    n_iters=10,
    use_soar=False,
)

```

## How ZVec Applies Schema and Index Configurations

ZVec enforces schema and index configurations at multiple stages of the collection lifecycle, as implemented in [`python/zvec/zvec.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/zvec.py) and the underlying C++ core.

### Collection Creation

When calling `zvec.Collection.create_and_open`, the supplied `CollectionSchema` is passed to the C++ core via `_Collection.CreateAndOpen`. The core performs the following operations:

1. Stores the schema metadata persistently
2. Builds any requested vector indexes immediately (HNSW, IVF, or Flat)
3. Registers inverted indexes for scalar fields that specify `InvertIndexParam`

### Dynamic Schema Modifications

ZVec supports schema evolution through the `Collection.add_column` method. When adding a new column, you can include an `index_param` argument, and ZVec will instantiate the proper index backend for that column immediately.

### Index Management

The `Collection.create_index` and `Collection.drop_index` methods allow runtime index modifications. These methods validate that the supplied `index_param` matches the field type (scalar vs. vector) before invoking the underlying C++ `CreateIndex` or `DropIndex` methods. The concrete index objects (`HnswIndexParams`, `IVFIndexParams`, `FlatIndexParams`, `InvertIndexParams`) are instantiated in the C++ layer based on these Python configurations.

## Key Source Files

The implementation of **zvec collection schema and index parameter configurations** spans both Python and C++ layers:

| Path | Role |
|------|------|
| [`python/zvec/model/schema/collection_schema.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/schema/collection_schema.py) | Collection-level schema definition and field name validation |
| [`python/zvec/model/schema/field_schema.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/schema/field_schema.py) | Scalar `FieldSchema` and `VectorSchema` implementations |
| [`python/zvec/model/param/__init__.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/param/__init__.py) | Python façade exposing C++ index parameter classes |
| [`src/include/zvec/db/index_params.h`](https://github.com/alibaba/zvec/blob/main/src/include/zvec/db/index_params.h) | C++ struct definitions for all index parameters |
| [`python/zvec/zvec.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/zvec.py) | High-level API methods (`create_and_open`, `add_column`, `create_index`) |
| [`python/tests/test_schema.py`](https://github.com/alibaba/zvec/blob/main/python/tests/test_schema.py) | Unit tests for schema construction validation |
| [`python/tests/test_params.py`](https://github.com/alibaba/zvec/blob/main/python/tests/test_params.py) | Unit tests for index parameter instantiation |

## Summary

- **CollectionSchema** acts as the top-level container in [`python/zvec/model/schema/collection_schema.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/schema/collection_schema.py), enforcing unique field names across scalar and vector fields.
- **FieldSchema** defines scalar columns with optional `InvertIndexParam` for range and wildcard optimization.
- **VectorSchema** defines embedding columns with required vector index parameters chosen from `FlatIndexParam`, `HnswIndexParam`, or `IVFIndexParam`.
- **Index parameters** are defined in C++ at [`src/include/zvec/db/index_params.h`](https://github.com/alibaba/zvec/blob/main/src/include/zvec/db/index_params.h) and exposed through `zvec.model.param`, controlling exact search, graph-based ANN, or inverted file indexing.
- **Lifecycle integration** occurs through `create_and_open`, `add_column`, and `create_index` methods in [`python/zvec/zvec.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/zvec.py), which validate configurations and instantiate concrete C++ index objects.

## Frequently Asked Questions

### What is the difference between FieldSchema and VectorSchema in ZVec?

`FieldSchema` defines scalar (non-vector) columns such as integers, strings, or timestamps, and optionally accepts an `InvertIndexParam` for range queries. `VectorSchema` specifically defines high-dimensional embedding columns, requires a dimensionality parameter, and mandates a vector index parameter (`FlatIndexParam`, `HnswIndexParam`, or `IVFIndexParam`) to enable similarity search.

### How do I choose between HNSW and IVF index parameters in ZVec?

Choose **`HnswIndexParam`** when you require sub-millisecond latency with high recall on datasets ranging from thousands to hundreds of millions of vectors, as it builds a navigable small-world graph optimized for approximate nearest neighbor search. Choose **`IVFIndexParam`** for billion-scale datasets where memory efficiency is critical, as it uses inverted file indexing with coarse quantization; enable the `use_soar` flag for additional acceleration on large partitions.

### Can I modify a collection schema after creating the collection?

Yes, ZVec supports schema evolution through the `Collection.add_column` method, which allows you to add new scalar or vector fields after initial creation. When adding a column, you can specify an `index_param` to immediately build the appropriate index backend. However, existing field definitions cannot be altered or removed; you can only add new columns or create/drop indexes on existing columns using `create_index` and `drop_index`.

### What file contains the C++ definitions for ZVec index parameters?

The C++ struct definitions for all index parameters are located in [`src/include/zvec/db/index_params.h`](https://github.com/alibaba/zvec/blob/main/src/include/zvec/db/index_params.h). This header defines `InvertIndexParam`, `FlatIndexParam`, `HnswIndexParam`, and `IVFIndexParam` structures that mirror the Python classes exposed through `zvec.model.param`. The Python layer in [`python/zvec/model/param/__init__.py`](https://github.com/alibaba/zvec/blob/main/python/zvec/model/param/__init__.py) acts as a façade that forwards configuration values to these underlying C++ implementations.