# How zvec's Concurrent Roaring Bitmap Improves Performance: 5 Key Optimizations

> Discover how zvec's concurrent roaring bitmap boosts performance with shared locks for lock-free reads, exclusive locks for efficient writes, and dynamic bit upgrades for better memory usage.

- Repository: [Alibaba/zvec](https://github.com/alibaba/zvec)
- Tags: performance
- Published: 2026-02-16

---

**zvec's concurrent roaring bitmap improves performance by using shared locks for lock-free reads, exclusive locks for minimal-contention writes, and dynamic 32-bit to 64-bit upgrades to optimize memory usage.**

The Alibaba zvec database engine implements a thread-safe wrapper around the Roaring bitmap library to handle high-concurrency workloads. This `ConcurrentRoaringBitmap` design enables millions of concurrent queries with low latency while maintaining memory efficiency. Understanding these optimizations helps developers implement high-performance bitmap operations in multi-threaded environments.

## Lock-Free Read Operations with Shared Locks

Read-heavy workloads benefit from zvec's **shared lock mechanism** that allows multiple threads to query the bitmap simultaneously without blocking each other.

In [`src/db/common/concurrent_roaring_bitmap.h`](https://github.com/alibaba/zvec/blob/main/src/db/common/concurrent_roaring_bitmap.h) (lines 61-70), read-only operations such as `contains`, `cardinality`, and `range_cardinality` acquire a `std::shared_lock<std::shared_mutex>`. This design ensures that readers never block each other, enabling high-throughput query workloads like posting-list lookups to scale linearly with thread count.

```cpp
// Multiple threads can execute this simultaneously
bool found = bitmap->contains(target_id);  // Uses shared_lock internally
uint64_t count = bitmap->cardinality();    // Non-blocking read

```

## Exclusive Write Operations with Minimal Contention

While reads scale horizontally, write operations maintain consistency through **exclusive locking** with minimal performance impact.

Mutating calls including `add`, `clear`, `remove_range_closed`, and `storage_size_in_bytes` use `std::unique_lock<std::shared_mutex>` as implemented in [`src/db/common/concurrent_roaring_bitmap.h`](https://github.com/alibaba/zvec/blob/main/src/db/common/concurrent_roaring_bitmap.h) (lines 87-106). Only one writer can modify the internal `roaring_bitmap_t` at a time, guaranteeing thread safety while allowing concurrent reads to proceed uninterrupted.

The write path remains short—typically just a call to the underlying Roaring API—so the exclusive window is tiny, limiting write-side stalls even under heavy contention.

## Dynamic 32-bit to 64-bit Upgrades

Memory efficiency meets flexibility through **lazy bitmap expansion** that starts with 32-bit storage and upgrades to 64-bit only when necessary.

`ConcurrentRoaringBitmap64` begins as a 32-bit bitmap and upgrades to 64-bit automatically when a position exceeds `UINT32_MAX`, as defined in [`src/db/common/concurrent_roaring_bitmap.h`](https://github.com/alibaba/zvec/blob/main/src/db/common/concurrent_roaring_bitmap.h) (lines 35-40). The upgrade routine, implemented in lines 99-108, copies the 32-bit structure into a `Roaring64Map` and discards the old representation.

This optimization ensures that most workloads remain in the 32-bit regime, which uses less memory and offers faster `rank` and `contains` operations, while avoiding the cost of allocating a 64-bit bitmap for every instance.

## Efficient Bulk Serialization and Range Queries

Specialized operations leverage Roaring's internal algorithms to deliver **O(log M)** complexity for range analytics and constant-time serialization.

The `range_cardinality` method computes the number of set bits between two IDs using Roaring's fast `rank` operation under a shared lock ([`src/db/common/concurrent_roaring_bitmap.h`](https://github.com/alibaba/zvec/blob/main/src/db/common/concurrent_roaring_bitmap.h), lines 73-84). This runs in O(log M) time where M equals the number of containers, rather than scanning each element, dramatically speeding up range scans common in inverted indexes.

For persistence, `roaring_bitmap_portable_serialize` executes under a unique lock in `src/db/common/concurrent_roaring_bitmap.cc` (lines 22-30), while deserialization rebuilds the exact bitmap in one step (lines 36-46). This makes snapshot loading O(N) where N is the number of containers, without extra copying or locking overhead.

## Practical Implementation Example

The following example demonstrates concurrent writers and readers operating safely on the same bitmap instance:

```cpp
#include "concurrent_roaring_bitmap.h"
#include <thread>
#include <vector>
#include <iostream>

using namespace zvec;

void writer(ConcurrentRoaringBitmap32::Ptr bmp, uint32_t start) {
    for (uint32_t i = 0; i < 1000; ++i) {
        bmp->add(start + i);           // exclusive unique lock, fast
    }
}

void reader(ConcurrentRoaringBitmap32::Ptr bmp, uint32_t val) {
    bool found = bmp->contains(val);   // shared lock, many threads can run together
    std::cout << val << (found ? " present" : " missing") << '\n';
}

int main() {
    auto bitmap = std::make_shared<ConcurrentRoaringBitmap32>();

    // Launch several writer threads
    std::vector<std::thread> writers;
    for (int t = 0; t < 4; ++t)
        writers.emplace_back(writer, bitmap, t * 10000);

    // Launch reader threads while writers are running
    std::vector<std::thread> readers;
    for (int i = 0; i < 8; ++i)
        readers.emplace_back(reader, bitmap, 15000 + i * 500);

    for (auto &t : writers) t.join();
    for (auto &t : readers) t.join();

    // Range query – how many IDs between 20000 and 25000?
    std::cout << "Range cardinality: "
              << bitmap->range_cardinality(20000, 25000) << '\n';
}

```

Key points demonstrated:

- Multiple writer threads safely add values using `add` with an internal `unique_lock`.
- Reader threads concurrently call `contains` without blocking each other using `shared_lock`.
- `range_cardinality` quickly answers bulk count queries using logarithmic-time rank operations.

## Summary

- **Lock-free reads** via `std::shared_lock` enable linear scalability for high-throughput query workloads.
- **Minimal write contention** through short-duration `std::unique_lock` windows ensures thread safety without blocking readers.
- **Lazy 32-bit to 64-bit upgrades** optimize memory usage and operation speed for the common case of smaller integer ranges.
- **O(log M) range queries** leverage Roaring's rank algorithm for fast analytics on inverted indexes.
- **Efficient serialization** provides O(N) snapshot persistence with consistent locking semantics.

## Frequently Asked Questions

### How does zvec's concurrent roaring bitmap handle read-heavy workloads?

zvec's concurrent roaring bitmap handles read-heavy workloads by using `std::shared_lock` for all read operations including `contains`, `cardinality`, and `range_cardinality`. This shared locking mechanism, implemented in [`src/db/common/concurrent_roaring_bitmap.h`](https://github.com/alibaba/zvec/blob/main/src/db/common/concurrent_roaring_bitmap.h), allows multiple threads to query the bitmap simultaneously without blocking each other, enabling linear scalability with thread count.

### What locking strategy does zvec use for write operations?

zvec uses `std::unique_lock<std::shared_mutex>` for all mutating operations including `add`, `clear`, and `remove_range_closed`. According to the implementation in [`src/db/common/concurrent_roaring_bitmap.h`](https://github.com/alibaba/zvec/blob/main/src/db/common/concurrent_roaring_bitmap.h) (lines 87-106), only one writer can modify the internal `roaring_bitmap_t` at a time. The write path is kept extremely short—typically just a call to the underlying Roaring API—minimizing contention windows while maintaining thread safety.

### When does zvec upgrade from 32-bit to 64-bit roaring bitmaps?

zvec upgrades from 32-bit to 64-bit roaring bitmaps only when a value exceeds `UINT32_MAX`. As implemented in [`src/db/common/concurrent_roaring_bitmap.h`](https://github.com/alibaba/zvec/blob/main/src/db/common/concurrent_roaring_bitmap.h) (lines 35-40), the `ConcurrentRoaringBitmap64` class starts as a 32-bit bitmap and performs a lazy upgrade to `Roaring64Map` when necessary. This optimization ensures that typical workloads benefit from the faster operations and lower memory footprint of 32-bit bitmaps while retaining 64-bit capability for edge cases.