performance

How zvec's Concurrent Roaring Bitmap Improves Performance: 5 Key Optimizations

February 16, 2026 alibaba/zvec ↗

zvec's concurrent roaring bitmap improves performance by using shared locks for lock-free reads, exclusive locks for minimal-contention writes, and dynamic 32-bit to 64-bit upgrades to optimize memory usage.

The Alibaba zvec database engine implements a thread-safe wrapper around the Roaring bitmap library to handle high-concurrency workloads. This ConcurrentRoaringBitmap design enables millions of concurrent queries with low latency while maintaining memory efficiency. Understanding these optimizations helps developers implement high-performance bitmap operations in multi-threaded environments.

Lock-Free Read Operations with Shared Locks

Read-heavy workloads benefit from zvec's shared lock mechanism that allows multiple threads to query the bitmap simultaneously without blocking each other.

In src/db/common/concurrent_roaring_bitmap.h (lines 61-70), read-only operations such as contains, cardinality, and range_cardinality acquire a std::shared_lock<std::shared_mutex>. This design ensures that readers never block each other, enabling high-throughput query workloads like posting-list lookups to scale linearly with thread count.

// Multiple threads can execute this simultaneously
bool found = bitmap->contains(target_id);  // Uses shared_lock internally
uint64_t count = bitmap->cardinality();    // Non-blocking read

Exclusive Write Operations with Minimal Contention

While reads scale horizontally, write operations maintain consistency through exclusive locking with minimal performance impact.

Mutating calls including add, clear, remove_range_closed, and storage_size_in_bytes use std::unique_lock<std::shared_mutex> as implemented in src/db/common/concurrent_roaring_bitmap.h (lines 87-106). Only one writer can modify the internal roaring_bitmap_t at a time, guaranteeing thread safety while allowing concurrent reads to proceed uninterrupted.

The write path remains short—typically just a call to the underlying Roaring API—so the exclusive window is tiny, limiting write-side stalls even under heavy contention.

Dynamic 32-bit to 64-bit Upgrades

Memory efficiency meets flexibility through lazy bitmap expansion that starts with 32-bit storage and upgrades to 64-bit only when necessary.

ConcurrentRoaringBitmap64 begins as a 32-bit bitmap and upgrades to 64-bit automatically when a position exceeds UINT32_MAX, as defined in src/db/common/concurrent_roaring_bitmap.h (lines 35-40). The upgrade routine, implemented in lines 99-108, copies the 32-bit structure into a Roaring64Map and discards the old representation.

This optimization ensures that most workloads remain in the 32-bit regime, which uses less memory and offers faster rank and contains operations, while avoiding the cost of allocating a 64-bit bitmap for every instance.

Efficient Bulk Serialization and Range Queries

Specialized operations leverage Roaring's internal algorithms to deliver O(log M) complexity for range analytics and constant-time serialization.

The range_cardinality method computes the number of set bits between two IDs using Roaring's fast rank operation under a shared lock (src/db/common/concurrent_roaring_bitmap.h, lines 73-84). This runs in O(log M) time where M equals the number of containers, rather than scanning each element, dramatically speeding up range scans common in inverted indexes.

For persistence, roaring_bitmap_portable_serialize executes under a unique lock in src/db/common/concurrent_roaring_bitmap.cc (lines 22-30), while deserialization rebuilds the exact bitmap in one step (lines 36-46). This makes snapshot loading O(N) where N is the number of containers, without extra copying or locking overhead.

Practical Implementation Example

The following example demonstrates concurrent writers and readers operating safely on the same bitmap instance:

#include "concurrent_roaring_bitmap.h"
#include <thread>
#include <vector>
#include <iostream>

using namespace zvec;

void writer(ConcurrentRoaringBitmap32::Ptr bmp, uint32_t start) {
    for (uint32_t i = 0; i < 1000; ++i) {
        bmp->add(start + i);           // exclusive unique lock, fast
    }
}

void reader(ConcurrentRoaringBitmap32::Ptr bmp, uint32_t val) {
    bool found = bmp->contains(val);   // shared lock, many threads can run together
    std::cout << val << (found ? " present" : " missing") << '\n';
}

int main() {
    auto bitmap = std::make_shared<ConcurrentRoaringBitmap32>();

    // Launch several writer threads
    std::vector<std::thread> writers;
    for (int t = 0; t < 4; ++t)
        writers.emplace_back(writer, bitmap, t * 10000);

    // Launch reader threads while writers are running
    std::vector<std::thread> readers;
    for (int i = 0; i < 8; ++i)
        readers.emplace_back(reader, bitmap, 15000 + i * 500);

    for (auto &t : writers) t.join();
    for (auto &t : readers) t.join();

    // Range query – how many IDs between 20000 and 25000?
    std::cout << "Range cardinality: "
              << bitmap->range_cardinality(20000, 25000) << '\n';
}

Key points demonstrated:

Multiple writer threads safely add values using add with an internal unique_lock.
Reader threads concurrently call contains without blocking each other using shared_lock.
range_cardinality quickly answers bulk count queries using logarithmic-time rank operations.

Summary

Lock-free reads via std::shared_lock enable linear scalability for high-throughput query workloads.
Minimal write contention through short-duration std::unique_lock windows ensures thread safety without blocking readers.
Lazy 32-bit to 64-bit upgrades optimize memory usage and operation speed for the common case of smaller integer ranges.
O(log M) range queries leverage Roaring's rank algorithm for fast analytics on inverted indexes.
Efficient serialization provides O(N) snapshot persistence with consistent locking semantics.

Frequently Asked Questions

How does zvec's concurrent roaring bitmap handle read-heavy workloads?

zvec's concurrent roaring bitmap handles read-heavy workloads by using std::shared_lock for all read operations including contains, cardinality, and range_cardinality. This shared locking mechanism, implemented in src/db/common/concurrent_roaring_bitmap.h, allows multiple threads to query the bitmap simultaneously without blocking each other, enabling linear scalability with thread count.

What locking strategy does zvec use for write operations?

zvec uses std::unique_lock<std::shared_mutex> for all mutating operations including add, clear, and remove_range_closed. According to the implementation in src/db/common/concurrent_roaring_bitmap.h (lines 87-106), only one writer can modify the internal roaring_bitmap_t at a time. The write path is kept extremely short—typically just a call to the underlying Roaring API—minimizing contention windows while maintaining thread safety.

When does zvec upgrade from 32-bit to 64-bit roaring bitmaps?

zvec upgrades from 32-bit to 64-bit roaring bitmaps only when a value exceeds UINT32_MAX. As implemented in src/db/common/concurrent_roaring_bitmap.h (lines 35-40), the ConcurrentRoaringBitmap64 class starts as a 32-bit bitmap and performs a lazy upgrade to Roaring64Map when necessary. This optimization ensures that typical workloads benefit from the faster operations and lower memory footprint of 32-bit bitmaps while retaining 64-bit capability for edge cases.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how alibaba/zvec works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →