# How to Handle Class ID Remapping When Merging Datasets from Different Sources

> Learn how supervision automatically remaps class IDs when merging datasets from different sources. Ensure consistent labeling with automatic vocabulary unification and index translation.

- Repository: [Roboflow/supervision](https://github.com/roboflow/supervision)
- Tags: how-to-guide
- Published: 2026-04-06

---

**Supervision automatically resolves class ID conflicts when merging DetectionDataset objects by unifying class vocabularies, constructing index translation mappings, and applying vectorized remapping to ensure consistent labeling across disparate data sources.**

When you aggregate computer vision datasets scraped from multiple platforms or annotated by different teams, conflicting numeric class IDs are inevitable. The Roboflow Supervision library eliminates this friction through an automated remapping pipeline integrated into the `DetectionDataset.merge` method. This guide explains how the library normalizes class indices when combining datasets with different label schemes.

## The Challenge of Conflicting Class IDs

When you instantiate separate `DetectionDataset` objects from independent sources, each assigns `class_id` integers based on its own local class list. A "dog" might be class `0` in Dataset A but class `2` in Dataset B. Directly concatenating these datasets without remapping would corrupt the annotations, causing the merged dataset to treat the same semantic class as different entities or merge distinct classes into one.

## The Three-Step Remapping Pipeline

The `DetectionDataset.merge` method orchestrates automatic class ID reconciliation through a standardized workflow implemented across [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py) and utility functions in [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py).

### Step 1: Unify Class Vocabularies

The process begins by collecting all distinct class names from every input dataset and sorting them alphabetically. This creates a **unified class list** that serves as the single source of truth for the merged result.

In [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py) (lines 55-63), the `merge_class_lists` function handles this collation. It aggregates the `classes` attributes from each dataset, deduplicates them, and returns a sorted master list ranging from indices `0` to `N-1` for the combined vocabulary.

### Step 2: Build Index Mappings

Once the unified list is established, the library constructs translation dictionaries for each source dataset. The `build_class_index_mapping` utility in [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py) (lines 65-80) generates a mapping dictionary formatted as `{source_index: unified_index}`.

For example, if Dataset A defines `["dog", "person"]` and the unified list is `["cat", "dog", "person"]`, the mapping dictionary for Dataset A becomes `{0: 1, 1: 2}`, moving "dog" from index 0 to 1 and "person" from 1 to 2.

### Step 3: Remap Detection Arrays

Finally, the `map_detections_class_id` function applies these translations to every annotation. Located in [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py) (lines 83-100), this function validates that all original IDs exist in the mapping keys, then uses NumPy vectorization to efficiently convert the `class_id` arrays within each `Detections` object to the new unified indices.

## Dataset Integrity Checks

Before the merge proceeds, `DetectionDataset.merge` enforces two critical consistency constraints defined in [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py).

### Image Path Uniqueness

The method checks for duplicate file names across input datasets. If two datasets contain the same image path (e.g., both have `image_001.jpg`), the merge raises a `ValueError` at lines 99-104 to prevent annotation collisions.

### Memory Mode Compatibility

Supervision datasets operate in either **lazy mode** (storing only file paths) or **in-memory mode** (storing loaded image arrays). The merge validates that all input datasets share the same memory mode, rejecting mixed combinations with a clear error at lines 85-91 of [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py).

## Practical Implementation

The following example demonstrates merging two datasets with conflicting class definitions. Dataset A uses indices 0 and 1 for "dog" and "person," while Dataset B uses index 0 for "cat." The merge operation automatically remaps these to a unified scheme where "cat"=0, "dog"=1, and "person"=2.

```python
import supervision as sv
import numpy as np

# Dataset A – classes ["dog", "person"]

ds_a = sv.DetectionDataset(
    classes=["dog", "person"],
    images={"a1.jpg": np.zeros((100, 100, 3), dtype=np.uint8)},
    annotations={"a1.jpg": sv.Detections(class_id=np.array([0]), xyxy=np.array([[10, 10, 20, 20]]))},
)

# Dataset B – classes ["cat"]

ds_b = sv.DetectionDataset(
    classes=["cat"],
    images={"b1.jpg": np.zeros((100, 100, 3), dtype=np.uint8)},
    annotations={"b1.jpg": sv.Detections(class_id=np.array([0]), xyxy=np.array([[30, 30, 40, 40]]))},
)

# Merge automatically handles class ID remapping

merged = sv.DetectionDataset.merge([ds_a, ds_b])

print("Unified classes:", merged.classes)          # → ['cat', 'dog', 'person']

print("Image count:", len(merged))                # → 2

print("IDs for a1.jpg:", merged.annotations["a1.jpg"].class_id)  # → [1] (dog → index 1)

print("IDs for b1.jpg:", merged.annotations["b1.jpg"].class_id)  # → [0] (cat → index 0)

```

## Summary

- **Automatic remapping**: The `DetectionDataset.merge` method handles class ID conflicts without manual intervention by unifying vocabularies and applying vectorized index conversions via NumPy.
- **Three-step pipeline**: The process involves `merge_class_lists` for vocabulary unification, `build_class_index_mapping` for translation dictionaries, and `map_detections_class_id` for array conversion.
- **Integrity enforcement**: The merge validates image path uniqueness and memory mode compatibility to prevent data corruption.
- **Source locations**: Core logic resides in [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py) with utility functions in [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py) (lines 55-100).

## Frequently Asked Questions

### What happens if two datasets have different class names for the same object?

Supervision treats class names as the source of truth rather than numeric IDs. When merging, it creates a unified alphabetical list of all unique class names across datasets. If "car" appears as class 0 in one dataset and class 5 in another, both will be remapped to the same unified index based on the sorted master list produced by `merge_class_lists`.

### Can I merge a lazy-loading dataset with one that has images loaded in memory?

No. The `DetectionDataset.merge` method explicitly prevents mixing memory modes to avoid inconsistent behavior. If you attempt to merge a lazy dataset (paths only) with an in-memory dataset (image arrays loaded), the library raises a `ValueError` at lines 85-91 of [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py). Convert both to the same mode before merging.

### Does the remapping process modify the original datasets?

No. The merge operation creates a new `DetectionDataset` instance without mutating the input datasets. The original `class_id` values in `ds_a` and `ds_b` remain unchanged; only the merged result contains the remapped indices computed by `map_detections_class_id`.

### How does Supervision handle duplicate image filenames across datasets?

The merge method checks for overlapping image paths in [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py) (lines 99-104) and raises a `ValueError` if duplicates are detected. You must ensure unique filenames across all datasets before merging, either by renaming files or adjusting the image dictionary keys.