# How to Create Custom Dataset Loaders for Proprietary Annotation Formats in Roboflow Supervision

> Learn how to create custom dataset loaders for proprietary annotation formats in Roboflow Supervision. Parse your data into Detections or Classifications objects and integrate seamlessly.

- Repository: [Roboflow/supervision](https://github.com/roboflow/supervision)
- Tags: how-to-guide
- Published: 2026-04-06

---

**You can create custom dataset loaders for proprietary annotation formats in Roboflow Supervision by parsing your annotation files into standard `Detections` or `Classifications` objects and instantiating `DetectionDataset` or `ClassificationDataset` with three specific arguments: `classes`, `images`, and `annotations`.**

Roboflow Supervision ships with built-in loaders for COCO, YOLO, Pascal VOC, and folder-based classification datasets, but computer vision pipelines frequently encounter proprietary JSON, CSV, or XML schemas. By leveraging the core classes defined in [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py), you can integrate any custom format with Supervision's splitting, augmentation, and export utilities without modifying the library's internals.

## Understanding the Core Architecture

Supervision's dataset system relies on a hierarchy of classes that enforce a consistent interface for data manipulation.

### BaseDataset

The abstract `BaseDataset` class at line 41 in [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py) defines the fundamental contract used throughout the library. It implements `__len__`, the `split` method, and other common utilities that automatically become available to your custom-loaded datasets.

### DetectionDataset

`DetectionDataset` (line 56, [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py)) is the concrete implementation for object detection tasks. It stores `classes` as a list of strings, `image_paths` as a list of strings, and `annotations` as a dictionary mapping image paths to `Detections` objects. This class provides the built-in `from_coco`, `from_yolo`, and `from_pascal_voc` class methods, as well as export helpers like `as_coco` and `as_yolo`.

### ClassificationDataset

For image-level classification, `ClassificationDataset` (line 664, [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py)) follows an identical pattern but uses `Classifications` objects instead of `Detections`. It includes the `from_folder_structure` loader and supports the same splitting and export workflows as its detection counterpart.

## The Three-Part Contract for Custom Loaders

To create a valid custom loader, you must produce three specific data structures:

1. **`classes`**: A `list[str]` containing unique class names in the desired order.
2. **`images`**: A `list[str]` of absolute file paths (or URLs) to your images.
3. **`annotations`**: A `dict[str, Detections]` for detection tasks (or `dict[str, Classifications]` for classification) that maps each image path to its corresponding annotation object.

Once these three components are prepared, instantiation is straightforward:

```python
import supervision as sv

dataset = sv.DetectionDataset(
    classes=class_names,
    images=image_paths,
    annotations=annotations_dict
)

```

All dataset-level functionality—including lazy image loading, `train_test_split` (defined in [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py)), and format export—becomes available immediately.

## Implementing a Custom Object Detection Loader

Consider a proprietary JSON format where each bounding box is stored as `[x_min, y_min, x_max, y_max, class_index]`.

### JSON Format Example

```python
import json
from pathlib import Path
import numpy as np
import supervision as sv

class MyCustomLoader:
    @classmethod
    def from_json(cls, json_path: str, images_root: str) -> sv.DetectionDataset:
        """Load a proprietary JSON annotation file into a DetectionDataset."""
        # Parse the proprietary JSON

        data = json.loads(Path(json_path).read_text())
        class_names = data["classes"]
        image_paths: list[str] = []
        annotations: dict[str, sv.Detections] = {}

        for item in data["items"]:
            img_path = str(Path(images_root) / item["image"])
            image_paths.append(img_path)

            # Convert raw boxes to Detections

            boxes = np.asarray(item["boxes"], dtype=np.float32)
            xyxy = boxes[:, :4]
            class_id = boxes[:, 4].astype(int)

            detections = sv.Detections(
                xyxy=xyxy,
                class_id=class_id,
            )
            annotations[img_path] = detections

        # Instantiate the dataset

        return sv.DetectionDataset(
            classes=class_names,
            images=image_paths,
            annotations=annotations,
        )

```

Usage follows the same pattern as built-in loaders:

```python
ds = MyCustomLoader.from_json(
    json_path="my_dataset/annotations.json",
    images_root="my_dataset",
)

print(ds.classes)  # ['cat', 'dog']

print(len(ds))     # Number of images

```

## Implementing a Custom Classification Loader

For classification tasks with CSV annotations, the pattern is identical but uses `Classifications` objects.

### CSV Format Example

```python
import csv
import numpy as np
import supervision as sv
from pathlib import Path

def classification_from_csv(csv_path: str, images_root: str) -> sv.ClassificationDataset:
    """Load image-level labels from a CSV file."""
    class_set = set()
    rows = []
    
    with open(csv_path, newline="") as f:
        for row in csv.DictReader(f):
            rows.append(row)
            class_set.add(row["label"])

    class_names = sorted(class_set)  # Deterministic ordering

    class_to_id = {c: i for i, c in enumerate(class_names)}

    image_paths = []
    annotations = {}
    
    for row in rows:
        img_path = str(Path(images_root) / row["image_path"])
        image_paths.append(img_path)
        annotations[img_path] = sv.Classifications(
            class_id=np.array([class_to_id[row["label"]]])
        )

    return sv.ClassificationDataset(
        classes=class_names,
        images=image_paths,
        annotations=annotations,
    )

```

## Registering Your Loader as a Native Method

To expose your loader through the standard API (e.g., `sv.DetectionDataset.from_myformat()`), you can attach it as a class method. While you could subclass `DetectionDataset`, monkey-patching is often sufficient for proprietary workflows:

```python
import supervision as sv

sv.DetectionDataset.from_myjson = MyCustomLoader.from_json  # type: ignore

# Now available as a native method

ds = sv.DetectionDataset.from_myjson(
    json_path="annotations.json",
    images_root="images/"
)

```

## Reference: Key Source Files

When building custom loaders, examine these reference implementations to understand parsing patterns:

| File | Purpose |
|------|---------|
| [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py) | Contains `BaseDataset`, `DetectionDataset` (line 56), and `ClassificationDataset` (line 664) definitions. |
| [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py) | Utility functions for `train_test_split` and image saving that your dataset inherits automatically. |
| [`src/supervision/dataset/formats/coco.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/formats/coco.py) | Demonstrates converting COCO JSON to `Detections` objects. |
| [`src/supervision/dataset/formats/yolo.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/formats/yolo.py) | Shows handling of separate image/annotation directory structures. |
| [`src/supervision/dataset/formats/pascal_voc.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/formats/pascal_voc.py) | XML parsing example for bounding box extraction. |
| [`src/supervision/detection/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/detection/core.py) | Definition of the `Detections` dataclass required for object detection datasets. |
| [`src/supervision/classification/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/classification/core.py) | Definition of the `Classifications` dataclass required for classification datasets. |

## Summary

- **Custom loaders require three components**: a list of class names, a list of image paths, and a dictionary mapping paths to `Detections` or `Classifications` objects.
- **Core classes reside in** [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py), with `DetectionDataset` at line 56 and `ClassificationDataset` at line 664.
- **Parsing proprietary formats** involves reading your files (JSON, CSV, XML) and converting bounding boxes or labels into NumPy arrays that instantiate `sv.Detections` or `sv.Classifications`.
- **Integration is seamless**: Once instantiated, custom datasets support `train_test_split`, lazy loading, and export to COCO/YOLO formats without additional code.
- **API extension**: Use monkey-patching or subclassing to expose custom loaders as `from_myformat` class methods.

## Frequently Asked Questions

### What are the minimum requirements for creating a custom dataset loader?

You must provide three arguments to instantiate a dataset: `classes` (a list of class name strings), `images` (a list of image file paths), and `annotations` (a dictionary mapping each image path to a `Detections` or `Classifications` object). As long as these match the type signatures expected by `DetectionDataset` or `ClassificationDataset` in [`src/supervision/dataset/core.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/core.py), the loader will function correctly with all built-in utilities.

### How should I handle class ordering in custom annotations?

Supervision expects `classes` to be an ordered list where the index corresponds to the `class_id` values in your `Detections` or `Classifications` objects. When parsing proprietary formats, extract unique class names, sort them deterministically (e.g., alphabetically), and create a mapping dictionary (class_name → integer_id) before constructing your annotation objects. This ensures that class indices remain consistent across training and inference.

### Can custom-loaded datasets use Supervision's train/test splitting?

Yes. Because your custom loader produces standard `DetectionDataset` or `ClassificationDataset` instances (which inherit from `BaseDataset` at line 41 of [`core.py`](https://github.com/roboflow/supervision/blob/main/core.py)), you can immediately call the `split` method or use `train_test_split` from [`src/supervision/dataset/utils.py`](https://github.com/roboflow/supervision/blob/main/src/supervision/dataset/utils.py). This works automatically once the dataset is instantiated, regardless of the original annotation format.

### Is it better to monkey-patch or subclass when adding a custom loader?

For proprietary or experimental formats, monkey-patching (assigning your function as a class attribute) is faster and avoids maintaining a fork of the library. For formats you plan to contribute back to the Roboflow Supervision repository, create a proper `@classmethod` following the pattern in `from_coco` or `from_yolo`, and submit a pull request with your implementation in `src/supervision/dataset/formats/`.