# How to Use Non-Maximum Suppression (NMS) in TensorFlow Models Vision Pipelines

> Learn how to use Non-Maximum Suppression NMS in TensorFlow Models vision pipelines. Optimize object detection with efficient batched NMS for thousands of proposals.

- Repository: [tensorflow/models](https://github.com/tensorflow/models)
- Tags: how-to-guide
- Published: 2026-02-28

---

**TensorFlow Models implements a batched, tile-based NMS algorithm in [`official/vision/ops/nms.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/nms.py) that efficiently processes thousands of proposals using `sorted_non_max_suppression_padded`, which is called by both the ROI generator and detection generator layers.**

The **tensorflow/models** repository provides a production-grade **Non-Maximum Suppression (NMS)** implementation specifically optimized for object detection pipelines. Unlike standard TensorFlow ops, this custom vision NMS uses a tiled while-loop architecture that scales to tens of thousands of boxes on GPU and TPU hardware.

## Core NMS Implementation in TensorFlow Models

The primary NMS logic resides in [`official/vision/ops/nms.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/nms.py). This implementation processes input tensors of shape `[batch, num_boxes, 4]` using a fixed **NMS_TILE_SIZE** of 512. By dividing the suppression work into tiles, the algorithm avoids Python loops and executes entirely within the TensorFlow graph, enabling efficient accelerator utilization.

### The Tiled NMS Algorithm Architecture

The `sorted_non_max_suppression_padded` function orchestrates suppression through five distinct stages:

1. **Padding** – Input boxes are padded to a multiple of 512 so each tile contains a uniform number of boxes.
2. **Cross-tile suppression** – The `_cross_suppression` function iterates over previous tiles, clearing any box that has IoU ≥ `iou_threshold` with a higher-scoring box from an earlier tile.
3. **Self-suppression** – Within each tile, `_self_suppression` uses a while-loop to remove boxes that overlap with other boxes inside the same tile.
4. **Output size tracking** – The `_suppression_loop_body` accumulates the count of surviving boxes in the `output_size` variable.
5. **Final gathering** – The algorithm uses `tf.nn.top_k` to select the highest-scoring boxes up to `max_output_size`, reshaping results to `[batch, max_output_size, 4]` and `[batch, max_output_size]`.

## Integration Points in Vision Pipelines

TensorFlow Models exposes NMS functionality through two primary layer classes that handle different stages of the detection workflow.

### ROI Generation with MultilevelROIGenerator

For Region Proposal Network (RPN) outputs, [`official/vision/modeling/layers/roi_generator.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/layers/roi_generator.py) implements `MultilevelROIGenerator`. This layer selects top-k boxes per FPN level, applies optional score and size filtering, and invokes `nms.sorted_non_max_suppression_padded` inside `_multilevel_propose_rois` to produce the final proposal set.

### Detection Post-Processing

The detection generator in [`official/vision/modeling/layers/detection_generator.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/layers/detection_generator.py) performs per-class NMS on refined box predictions. The `_generate_detections_v1` function calls the core NMS wrapper to suppress overlapping detections across different object categories before selecting the final top-k detections.

## Code Examples

### Basic NMS Call

To apply suppression directly to raw predictions:

```python
import tensorflow as tf
from official.vision.ops import nms

def apply_nms(scores, boxes, max_output, iou_thr=0.5):
    """
    scores: [batch, N]  – confidence scores per box
    boxes:  [batch, N, 4] – y1, x1, y2, x2 (float32)
    """
    nms_scores, nms_boxes = nms.sorted_non_max_suppression_padded(
        scores, boxes, max_output_size=max_output, iou_threshold=iou_thr
    )
    return nms_scores, nms_boxes

```

### Batched NMS for GPU/TPU Optimization

When `use_batched_nms=True`, the pipeline uses `tf.image.combined_non_max_suppression` instead of the custom implementation. This is the approach found in [`roi_generator.py`](https://github.com/tensorflow/models/blob/main/roi_generator.py) lines 30-41:

```python
def batched_nms(scores, boxes, max_per_class, max_total, iou_thr=0.5):
    """
    scores: [batch, N, 1]   (expanded dim for combined op)
    boxes:  [batch, N, 1, 4] (expanded dim)
    """
    nmsed_boxes, nmsed_scores, _, _ = tf.image.combined_non_max_suppression(
        tf.expand_dims(boxes, axis=2),               # [B, N, 1, 4]

        tf.expand_dims(scores, axis=-1),             # [B, N, 1]

        max_output_size_per_class=max_per_class,
        max_total_size=max_total,
        iou_threshold=iou_thr,
        score_threshold=0.0,
        pad_per_class=False,
        clip_boxes=False,
    )
    return nmsed_scores, nmsed_boxes

```

### Complete ROI Generation Pipeline

For Faster R-CNN style architectures:

```python
from official.vision.modeling.layers import roi_generator

roi_gen = roi_generator.MultilevelROIGenerator(
    pre_nms_top_k=2000,
    nms_iou_threshold=0.7,
    num_proposals=1000,
    use_batched_nms=False  # uses sorted_non_max_suppression_padded

)

proposals, proposal_scores = roi_gen(
    raw_boxes=rpn_boxes,
    raw_scores=rpn_scores,
    anchor_boxes=anchor_boxes,
    image_shape=image_shape,
    training=False
)

```

### Class-Aware Detection Generation

For final detection outputs:

```python
from official.vision.modeling.layers import detection_generator

detections = detection_generator.generate_detections(
    boxes=box_preds,        # [B, N, C, 4]

    scores=class_logits,    # [B, N, C]

    pre_nms_top_k=5000,
    nms_iou_threshold=0.5,
    max_num_detections=100,
)

```

## Custom NMS vs. Batched NMS

Choose the appropriate implementation based on your pipeline requirements:

- **`sorted_non_max_suppression_padded`** – Use this when you need **class-aware** NMS with different IoU thresholds per class, or when maintaining exact compatibility with legacy detection models. This is the default in `official/vision/`.
- **`tf.image.combined_non_max_suppression`** – Select this for large-batch GPU/TPU inference where **class-agnostic** suppression is acceptable. Enable this by setting `use_batched_nms=True` in the generator configuration.

## Summary

- TensorFlow Models provides a **tile-based NMS** in [`official/vision/ops/nms.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/nms.py) optimized for accelerator hardware.
- The `sorted_non_max_suppression_padded` function handles padding, cross-tile suppression via `_cross_suppression`, and self-suppression via `_self_suppression` automatically.
- **MultilevelROIGenerator** and **DetectionGenerator** layers abstract NMS usage for proposal generation and final detection refinement.
- For maximum GPU/TPU performance with large batches, switch to `tf.image.combined_non_max_suppression` by setting `use_batched_nms=True`.

## Frequently Asked Questions

### What is the difference between the custom NMS and TensorFlow's built-in NMS?

The custom implementation in [`official/vision/ops/nms.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/nms.py) uses a tiled algorithm that processes boxes in 512-element chunks via `_suppression_loop_body`, enabling efficient handling of very large proposal counts (10,000+) on TPU. TensorFlow's `tf.image.combined_non_max_suppression` is a fused GPU operation optimized for batched inference but typically handles fewer boxes per batch with class-agnostic suppression.

### How do I configure the NMS threshold in Faster R-CNN pipelines?

Set the `nms_iou_threshold` parameter when instantiating `MultilevelROIGenerator` for RPN proposals (default 0.7), or pass `nms_iou_threshold` to `generate_detections` for final detection suppression (default 0.5). These values control the IoU threshold passed to `sorted_non_max_suppression_padded`.

### Can I use NMS outside of the standard ROI and detection generators?

Yes. Import `sorted_non_max_suppression_padded` directly from `official.vision.ops.nms` and apply it to any tensor of shape `[batch, num_boxes, 4]` with corresponding scores of shape `[batch, num_boxes]`. This is useful for custom post-processing or third-party model architectures that require TensorFlow Models' efficient tiled implementation.

### Why does the NMS implementation pad boxes to multiples of 512?

The padding ensures uniform tile sizes for the while-loop operations in `_suppression_loop_body`. This allows the algorithm to process tiles in parallel on GPU/TPU while maintaining deterministic behavior, with `NMS_TILE_SIZE` hardcoded at 512 for optimal memory alignment and computational efficiency.