How to Implement Multi-Resolution Detection with InferenceSlicer for Sliding Window Approach

The InferenceSlicer in the roboflow/supervision library processes large images by dividing them into overlapping tiles, executing a user-provided detection callback on each tile, and merging results back to the original coordinate system; to achieve multi-resolution detection, run the slicer multiple times with different slice_wh values and combine the per-scale Detections objects.

The InferenceSlicer class in roboflow/supervision provides a production-ready sliding-window implementation for computer vision workflows where models cannot process high-resolution imagery in a single forward pass. This guide demonstrates how to implement multi-resolution detection—executing inference at multiple scales simultaneously—to capture both fine-grained details and broad contextual information.

How InferenceSlicer Works (Single-Scale Foundation)

Before implementing multi-resolution strategies, understanding the single-scale pipeline is essential. The slicer orchestrates four distinct phases implemented in src/supervision/detection/tools/inference_slicer.py.

Tile Generation via _generate_offset()

The slicer creates sliding windows using the _generate_offset() method (lines 89-115), which calculates an array of (x_min, y_min, x_max, y_max) coordinates based on the slice_wh tuple (tile width/height) and overlap_wh tuple (horizontal/vertical overlap).


# From inference_slicer.py - conceptual flow

offsets = self._generate_offset(
    image_wh=(image_width, image_height),
    slice_wh=self.slice_wh,
    overlap_wh=self.overlap_wh
)

Each offset represents a crop region that the slicer will extract and process independently.

Parallel Callback Execution and Coordinate Correction

For each generated offset, the slicer spawns worker threads (controlled by thread_workers) that execute _run_callback() (lines 88-104). This internal method crops the image tile using crop_image() from src/supervision/utils/image.py, invokes the user-provided callback function, and shifts the returned detection coordinates back to the global image space using move_detections() (lines 22-52).

def _run_callback(self, offset, image, callback):
    x_min, y_min, x_max, y_max = offset
    tile = crop_image(image, (x_min, y_min, x_max, y_max))
    detections = callback(tile)
    return move_detections(detections, (x_min, y_min))

Overlap Handling and Result Merging

After all tiles process, the slicer merges individual Detections objects using Detections.merge() from src/supervision/detection/core.py. If overlap_filter is enabled (either OverlapFilter.NON_MAX_SUPPRESSION or NON_MAX_MERGE), the slicer applies the specified algorithm using iou_threshold to resolve duplicate detections appearing in overlapping tile regions.

Implementing Multi-Resolution Detection

Multi-resolution detection requires orchestrating multiple InferenceSlicer instances—each configured with different tile sizes—and aggregating their results. Larger tiles (slice_wh=1280) capture broader contextual information, while smaller tiles (slice_wh=640) preserve fine details.

import supervision as sv
from ultralytics import YOLO

model = YOLO("yolo11m.pt")

def tile_callback(tile):
    results = model(tile)[0]
    return sv.Detections.from_ultralytics(results)

def multi_resolution_detect(image, scales, overlap_wh=100, thread_workers=4):
    """
    Run InferenceSlicer at multiple resolutions and merge results.
    
    Args:
        image: Input image (numpy array)
        scales: List of tile sizes (e.g., [640, 960, 1280])
        overlap_wh: Overlap between adjacent tiles
        thread_workers: Parallel workers per scale
    """
    per_scale_detections = []
    
    for slice_wh in scales:
        slicer = sv.InferenceSlicer(
            callback=tile_callback,
            slice_wh=(slice_wh, slice_wh),  # square tiles

            overlap_wh=(overlap_wh, overlap_wh),
            overlap_filter=sv.OverlapFilter.NON_MAX_SUPPRESSION,
            iou_threshold=0.5,
            overlap_metric=sv.OverlapMetric.IOU,
            thread_workers=thread_workers,
        )
        detections = slicer(image)
        per_scale_detections.append(detections)
    
    # Merge all scale-specific detections

    merged = sv.Detections.merge(per_scale_detections)
    
    # Optional: Final cross-scale NMS to remove duplicates

    merged = merged.with_nms(
        threshold=0.5, 
        overlap_metric=sv.OverlapMetric.IOU
    )
    
    return merged

# Usage

image = sv.utils.image.read_image("large_aerial_image.jpg")
detections = multi_resolution_detect(
    image=image,
    scales=[640, 960, 1280],
    overlap_wh=100,
    thread_workers=8
)

Key implementation details:

  • Per-scale NMS: Each slicer instance applies non-maximum suppression independently before returning results
  • Global merge: Detections.merge() concatenates bounding boxes, confidence scores, and class IDs from all scales into a single object
  • Cross-scale NMS: The final with_nms() call eliminates duplicate detections that appear across different resolutions

Alternative Approach: Image Pyramids vs. Tile Scaling

Instead of varying tile sizes, you can maintain a fixed tile size while scaling the input image itself. This approach requires rescaling detection coordinates back to the original resolution after inference.

def multi_scale_image_pyramid(image, scale_factors, slice_wh=640):
    per_scale = []
    
    for factor in scale_factors:
        # Scale down image

        scaled = sv.utils.image.resize_image(image, scale=factor)
        
        slicer = sv.InferenceSlicer(
            callback=tile_callback,
            slice_wh=(slice_wh, slice_wh),
            overlap_wh=(100, 100),
            thread_workers=4,
        )
        detections = slicer(scaled)
        
        # Rescale coordinates back to original image size

        # Manual implementation if rescale method unavailable:

        detections.xyxy = detections.xyxy * (1 / factor)
        if detections.mask is not None:
            # Rescale masks similarly

            pass
            
        per_scale.append(detections)
    
    return sv.Detections.merge(per_scale).with_nms(0.5, sv.OverlapMetric.IOU)

This method processes the same number of tiles per scale but varies the effective receptive field relative to the original image resolution.

Critical Configuration Parameters

slice_wh (Tuple[int, int])

Defines the tile dimensions (width, height). For multi-resolution workflows, specify progressive sizes (e.g., 640→960→1280) to balance detail capture against computational cost.

overlap_wh (Tuple[int, int])

Controls pixel overlap between adjacent tiles. Setting this to approximately 20-25% of slice_wh prevents objects from being truncated at tile boundaries. The source code in _generate_offset() calculates stride as slice_wh - overlap_wh.

thread_workers (int)

Specifies parallel workers for tile processing. Set to -1 to use all available CPU cores, or 0/1 for sequential execution when debugging or working with GPU memory-constrained callbacks.

overlap_filter (OverlapFilter)

Determines how duplicates in overlapping regions are resolved:

  • NON_MAX_SUPPRESSION: Standard NMS keeping highest confidence box
  • NON_MAX_MERGE: Merges box coordinates and confidence scores
  • NONE: Retains all detections (useful when your callback already handles overlaps)

Summary

  • The InferenceSlicer class in src/supervision/detection/tools/inference_slicer.py implements sliding-window detection via _generate_offset() for tile creation and move_detections() for coordinate correction
  • Implement multi-resolution detection by instantiating multiple slicers with different slice_wh values and merging results with Detections.merge()
  • Configure overlap_wh to prevent boundary artifacts and thread_workers to optimize throughput
  • Apply a final with_nms() call after merging cross-scale results to eliminate duplicate detections

Frequently Asked Questions

What is the difference between slice_wh and overlap_wh?

The slice_wh parameter defines the dimensions of each tile extracted from the image, while overlap_wh specifies how many pixels adjacent tiles should share. According to the source code in inference_slicer.py, the stride between tile starts is calculated as slice_wh - overlap_wh. Larger overlaps reduce the risk of missing objects at tile boundaries but increase computational overhead.

How does InferenceSlicer handle detections at tile boundaries?

The slicer uses move_detections() (defined in lines 22-52 of inference_slicer.py) to translate tile-local coordinates back to the global image space by adding the tile's (x_min, y_min) offset. When overlap_filter is enabled, the slicer applies NMS or NMM algorithms to resolve detections that appear in multiple overlapping tiles, using the iou_threshold parameter to determine merge criteria.

Can I use InferenceSlicer with instance segmentation models?

Yes. The callback function can return Detections objects containing masks, and the slicer will handle coordinate transformation for both bounding boxes and segmentation masks. The move_masks() function in src/supervision/detection/utils/masks.py handles spatial translation of mask arrays, ensuring that pixel-accurate masks align correctly when merged back to the full-resolution image.

What overlap filter should I use for multi-resolution detection?

For multi-resolution pipelines, use OverlapFilter.NON_MAX_SUPPRESSION with an iou_threshold of 0.5 during the per-scale slicing phase to clean up tile boundary duplicates. After merging all scales with Detections.merge(), apply a second NMS pass using with_nms() to handle duplicates that appear across different resolutions. This two-stage filtering prevents the same object detected at different scales from appearing multiple times in final results.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →