how-to-guide

How to Use Non-Maximum Suppression (NMS) in TensorFlow Models Vision Pipelines

February 28, 2026 tensorflow/models ↗

TensorFlow Models implements a batched, tile-based NMS algorithm in official/vision/ops/nms.py that efficiently processes thousands of proposals using sorted_non_max_suppression_padded, which is called by both the ROI generator and detection generator layers.

The tensorflow/models repository provides a production-grade Non-Maximum Suppression (NMS) implementation specifically optimized for object detection pipelines. Unlike standard TensorFlow ops, this custom vision NMS uses a tiled while-loop architecture that scales to tens of thousands of boxes on GPU and TPU hardware.

Core NMS Implementation in TensorFlow Models

The primary NMS logic resides in official/vision/ops/nms.py. This implementation processes input tensors of shape [batch, num_boxes, 4] using a fixed NMS_TILE_SIZE of 512. By dividing the suppression work into tiles, the algorithm avoids Python loops and executes entirely within the TensorFlow graph, enabling efficient accelerator utilization.

The Tiled NMS Algorithm Architecture

The sorted_non_max_suppression_padded function orchestrates suppression through five distinct stages:

Padding – Input boxes are padded to a multiple of 512 so each tile contains a uniform number of boxes.
Cross-tile suppression – The _cross_suppression function iterates over previous tiles, clearing any box that has IoU ≥ iou_threshold with a higher-scoring box from an earlier tile.
Self-suppression – Within each tile, _self_suppression uses a while-loop to remove boxes that overlap with other boxes inside the same tile.
Output size tracking – The _suppression_loop_body accumulates the count of surviving boxes in the output_size variable.
Final gathering – The algorithm uses tf.nn.top_k to select the highest-scoring boxes up to max_output_size, reshaping results to [batch, max_output_size, 4] and [batch, max_output_size].

Integration Points in Vision Pipelines

TensorFlow Models exposes NMS functionality through two primary layer classes that handle different stages of the detection workflow.

ROI Generation with MultilevelROIGenerator

For Region Proposal Network (RPN) outputs, official/vision/modeling/layers/roi_generator.py implements MultilevelROIGenerator. This layer selects top-k boxes per FPN level, applies optional score and size filtering, and invokes nms.sorted_non_max_suppression_padded inside _multilevel_propose_rois to produce the final proposal set.

Detection Post-Processing

The detection generator in official/vision/modeling/layers/detection_generator.py performs per-class NMS on refined box predictions. The _generate_detections_v1 function calls the core NMS wrapper to suppress overlapping detections across different object categories before selecting the final top-k detections.

Code Examples

Basic NMS Call

To apply suppression directly to raw predictions:

import tensorflow as tf
from official.vision.ops import nms

def apply_nms(scores, boxes, max_output, iou_thr=0.5):
    """
    scores: [batch, N]  – confidence scores per box
    boxes:  [batch, N, 4] – y1, x1, y2, x2 (float32)
    """
    nms_scores, nms_boxes = nms.sorted_non_max_suppression_padded(
        scores, boxes, max_output_size=max_output, iou_threshold=iou_thr
    )
    return nms_scores, nms_boxes

Batched NMS for GPU/TPU Optimization

When use_batched_nms=True, the pipeline uses tf.image.combined_non_max_suppression instead of the custom implementation. This is the approach found in roi_generator.py lines 30-41:

def batched_nms(scores, boxes, max_per_class, max_total, iou_thr=0.5):
    """
    scores: [batch, N, 1]   (expanded dim for combined op)
    boxes:  [batch, N, 1, 4] (expanded dim)
    """
    nmsed_boxes, nmsed_scores, _, _ = tf.image.combined_non_max_suppression(
        tf.expand_dims(boxes, axis=2),               # [B, N, 1, 4]

        tf.expand_dims(scores, axis=-1),             # [B, N, 1]

        max_output_size_per_class=max_per_class,
        max_total_size=max_total,
        iou_threshold=iou_thr,
        score_threshold=0.0,
        pad_per_class=False,
        clip_boxes=False,
    )
    return nmsed_scores, nmsed_boxes

Complete ROI Generation Pipeline

For Faster R-CNN style architectures:

from official.vision.modeling.layers import roi_generator

roi_gen = roi_generator.MultilevelROIGenerator(
    pre_nms_top_k=2000,
    nms_iou_threshold=0.7,
    num_proposals=1000,
    use_batched_nms=False  # uses sorted_non_max_suppression_padded

)

proposals, proposal_scores = roi_gen(
    raw_boxes=rpn_boxes,
    raw_scores=rpn_scores,
    anchor_boxes=anchor_boxes,
    image_shape=image_shape,
    training=False
)

Class-Aware Detection Generation

For final detection outputs:

from official.vision.modeling.layers import detection_generator

detections = detection_generator.generate_detections(
    boxes=box_preds,        # [B, N, C, 4]

    scores=class_logits,    # [B, N, C]

    pre_nms_top_k=5000,
    nms_iou_threshold=0.5,
    max_num_detections=100,
)

Custom NMS vs. Batched NMS

Choose the appropriate implementation based on your pipeline requirements:

sorted_non_max_suppression_padded – Use this when you need class-aware NMS with different IoU thresholds per class, or when maintaining exact compatibility with legacy detection models. This is the default in official/vision/.
tf.image.combined_non_max_suppression – Select this for large-batch GPU/TPU inference where class-agnostic suppression is acceptable. Enable this by setting use_batched_nms=True in the generator configuration.

Summary

TensorFlow Models provides a tile-based NMS in official/vision/ops/nms.py optimized for accelerator hardware.
The sorted_non_max_suppression_padded function handles padding, cross-tile suppression via _cross_suppression, and self-suppression via _self_suppression automatically.
MultilevelROIGenerator and DetectionGenerator layers abstract NMS usage for proposal generation and final detection refinement.
For maximum GPU/TPU performance with large batches, switch to tf.image.combined_non_max_suppression by setting use_batched_nms=True.

Frequently Asked Questions

What is the difference between the custom NMS and TensorFlow's built-in NMS?

The custom implementation in official/vision/ops/nms.py uses a tiled algorithm that processes boxes in 512-element chunks via _suppression_loop_body, enabling efficient handling of very large proposal counts (10,000+) on TPU. TensorFlow's tf.image.combined_non_max_suppression is a fused GPU operation optimized for batched inference but typically handles fewer boxes per batch with class-agnostic suppression.

How do I configure the NMS threshold in Faster R-CNN pipelines?

Set the nms_iou_threshold parameter when instantiating MultilevelROIGenerator for RPN proposals (default 0.7), or pass nms_iou_threshold to generate_detections for final detection suppression (default 0.5). These values control the IoU threshold passed to sorted_non_max_suppression_padded.

Can I use NMS outside of the standard ROI and detection generators?

Yes. Import sorted_non_max_suppression_padded directly from official.vision.ops.nms and apply it to any tensor of shape [batch, num_boxes, 4] with corresponding scores of shape [batch, num_boxes]. This is useful for custom post-processing or third-party model architectures that require TensorFlow Models' efficient tiled implementation.

Why does the NMS implementation pad boxes to multiples of 512?

The padding ensures uniform tile sizes for the while-loop operations in _suppression_loop_body. This allows the algorithm to process tiles in parallel on GPU/TPU while maintaining deterministic behavior, with NMS_TILE_SIZE hardcoded at 512 for optimal memory alignment and computational efficiency.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how tensorflow/models works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →