How to Use Non-Maximum Suppression (NMS) in TensorFlow Models Vision Pipelines
TensorFlow Models implements a batched, tile-based NMS algorithm in official/vision/ops/nms.py that efficiently processes thousands of proposals using sorted_non_max_suppression_padded, which is called by both the ROI generator and detection generator layers.
The tensorflow/models repository provides a production-grade Non-Maximum Suppression (NMS) implementation specifically optimized for object detection pipelines. Unlike standard TensorFlow ops, this custom vision NMS uses a tiled while-loop architecture that scales to tens of thousands of boxes on GPU and TPU hardware.
Core NMS Implementation in TensorFlow Models
The primary NMS logic resides in official/vision/ops/nms.py. This implementation processes input tensors of shape [batch, num_boxes, 4] using a fixed NMS_TILE_SIZE of 512. By dividing the suppression work into tiles, the algorithm avoids Python loops and executes entirely within the TensorFlow graph, enabling efficient accelerator utilization.
The Tiled NMS Algorithm Architecture
The sorted_non_max_suppression_padded function orchestrates suppression through five distinct stages:
- Padding – Input boxes are padded to a multiple of 512 so each tile contains a uniform number of boxes.
- Cross-tile suppression – The
_cross_suppressionfunction iterates over previous tiles, clearing any box that has IoU ≥iou_thresholdwith a higher-scoring box from an earlier tile. - Self-suppression – Within each tile,
_self_suppressionuses a while-loop to remove boxes that overlap with other boxes inside the same tile. - Output size tracking – The
_suppression_loop_bodyaccumulates the count of surviving boxes in theoutput_sizevariable. - Final gathering – The algorithm uses
tf.nn.top_kto select the highest-scoring boxes up tomax_output_size, reshaping results to[batch, max_output_size, 4]and[batch, max_output_size].
Integration Points in Vision Pipelines
TensorFlow Models exposes NMS functionality through two primary layer classes that handle different stages of the detection workflow.
ROI Generation with MultilevelROIGenerator
For Region Proposal Network (RPN) outputs, official/vision/modeling/layers/roi_generator.py implements MultilevelROIGenerator. This layer selects top-k boxes per FPN level, applies optional score and size filtering, and invokes nms.sorted_non_max_suppression_padded inside _multilevel_propose_rois to produce the final proposal set.
Detection Post-Processing
The detection generator in official/vision/modeling/layers/detection_generator.py performs per-class NMS on refined box predictions. The _generate_detections_v1 function calls the core NMS wrapper to suppress overlapping detections across different object categories before selecting the final top-k detections.
Code Examples
Basic NMS Call
To apply suppression directly to raw predictions:
import tensorflow as tf
from official.vision.ops import nms
def apply_nms(scores, boxes, max_output, iou_thr=0.5):
"""
scores: [batch, N] – confidence scores per box
boxes: [batch, N, 4] – y1, x1, y2, x2 (float32)
"""
nms_scores, nms_boxes = nms.sorted_non_max_suppression_padded(
scores, boxes, max_output_size=max_output, iou_threshold=iou_thr
)
return nms_scores, nms_boxes
Batched NMS for GPU/TPU Optimization
When use_batched_nms=True, the pipeline uses tf.image.combined_non_max_suppression instead of the custom implementation. This is the approach found in roi_generator.py lines 30-41:
def batched_nms(scores, boxes, max_per_class, max_total, iou_thr=0.5):
"""
scores: [batch, N, 1] (expanded dim for combined op)
boxes: [batch, N, 1, 4] (expanded dim)
"""
nmsed_boxes, nmsed_scores, _, _ = tf.image.combined_non_max_suppression(
tf.expand_dims(boxes, axis=2), # [B, N, 1, 4]
tf.expand_dims(scores, axis=-1), # [B, N, 1]
max_output_size_per_class=max_per_class,
max_total_size=max_total,
iou_threshold=iou_thr,
score_threshold=0.0,
pad_per_class=False,
clip_boxes=False,
)
return nmsed_scores, nmsed_boxes
Complete ROI Generation Pipeline
For Faster R-CNN style architectures:
from official.vision.modeling.layers import roi_generator
roi_gen = roi_generator.MultilevelROIGenerator(
pre_nms_top_k=2000,
nms_iou_threshold=0.7,
num_proposals=1000,
use_batched_nms=False # uses sorted_non_max_suppression_padded
)
proposals, proposal_scores = roi_gen(
raw_boxes=rpn_boxes,
raw_scores=rpn_scores,
anchor_boxes=anchor_boxes,
image_shape=image_shape,
training=False
)
Class-Aware Detection Generation
For final detection outputs:
from official.vision.modeling.layers import detection_generator
detections = detection_generator.generate_detections(
boxes=box_preds, # [B, N, C, 4]
scores=class_logits, # [B, N, C]
pre_nms_top_k=5000,
nms_iou_threshold=0.5,
max_num_detections=100,
)
Custom NMS vs. Batched NMS
Choose the appropriate implementation based on your pipeline requirements:
sorted_non_max_suppression_padded– Use this when you need class-aware NMS with different IoU thresholds per class, or when maintaining exact compatibility with legacy detection models. This is the default inofficial/vision/.tf.image.combined_non_max_suppression– Select this for large-batch GPU/TPU inference where class-agnostic suppression is acceptable. Enable this by settinguse_batched_nms=Truein the generator configuration.
Summary
- TensorFlow Models provides a tile-based NMS in
official/vision/ops/nms.pyoptimized for accelerator hardware. - The
sorted_non_max_suppression_paddedfunction handles padding, cross-tile suppression via_cross_suppression, and self-suppression via_self_suppressionautomatically. - MultilevelROIGenerator and DetectionGenerator layers abstract NMS usage for proposal generation and final detection refinement.
- For maximum GPU/TPU performance with large batches, switch to
tf.image.combined_non_max_suppressionby settinguse_batched_nms=True.
Frequently Asked Questions
What is the difference between the custom NMS and TensorFlow's built-in NMS?
The custom implementation in official/vision/ops/nms.py uses a tiled algorithm that processes boxes in 512-element chunks via _suppression_loop_body, enabling efficient handling of very large proposal counts (10,000+) on TPU. TensorFlow's tf.image.combined_non_max_suppression is a fused GPU operation optimized for batched inference but typically handles fewer boxes per batch with class-agnostic suppression.
How do I configure the NMS threshold in Faster R-CNN pipelines?
Set the nms_iou_threshold parameter when instantiating MultilevelROIGenerator for RPN proposals (default 0.7), or pass nms_iou_threshold to generate_detections for final detection suppression (default 0.5). These values control the IoU threshold passed to sorted_non_max_suppression_padded.
Can I use NMS outside of the standard ROI and detection generators?
Yes. Import sorted_non_max_suppression_padded directly from official.vision.ops.nms and apply it to any tensor of shape [batch, num_boxes, 4] with corresponding scores of shape [batch, num_boxes]. This is useful for custom post-processing or third-party model architectures that require TensorFlow Models' efficient tiled implementation.
Why does the NMS implementation pad boxes to multiples of 512?
The padding ensures uniform tile sizes for the while-loop operations in _suppression_loop_body. This allows the algorithm to process tiles in parallel on GPU/TPU while maintaining deterministic behavior, with NMS_TILE_SIZE hardcoded at 512 for optimal memory alignment and computational efficiency.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →