tutorial

How to Implement Object Detection with RetinaNet Using TensorFlow Models

February 28, 2026 tensorflow/models ↗

You can implement object detection with RetinaNet by configuring the RetinaNet dataclass in official/vision/configs/retinanet.py, assembling the model via factory.build_retinanet, and training through the RetinaNetTask class which handles focal loss, anchor generation, and NMS automatically.

RetinaNet is a one-stage dense object detector that combines a backbone-FPN feature pyramid with focal-loss training to achieve high accuracy at real-time speeds. The TensorFlow Models repository provides a modular, production-ready implementation that lets you build, train, and deploy RetinaNet without writing boilerplate code for anchor generation or post-processing. This guide walks through the architecture and provides copy-paste code examples to run RetinaNet on your own dataset.

RetinaNet Architecture Components

The implementation in tensorflow/models follows a modular design where each component is instantiated through factory functions and wired together by the RetinaNetModel class.

Backbone: Extracts multi-scale feature maps using networks like ResNet-50. Built by backbones.factory.build_backbone in official/vision/modeling/backbones/factory.py.
FPN Decoder: Merges backbone levels into a feature pyramid (P3-P7) via decoders.factory.build_decoder in official/vision/modeling/decoders/factory.py.
RetinaNet Head: Two parallel sub-heads for classification (num_classes × num_anchors scores) and box regression (4 × num_anchors offsets). Implemented in dense_prediction_heads.RetinaNetHead at official/vision/modeling/heads/dense_prediction_heads.py (lines 7-30).
Anchor Generator: Generates multiscale anchor boxes for each pyramid level. Logic resides in anchor.Anchor within official/vision/ops/anchor.py, invoked automatically by RetinaNetModel when anchor_boxes are not supplied.
Detection Generator: Performs box decoding, score thresholding, and Non-Maximum Suppression (NMS) via detection_generator.MultilevelDetectionGenerator in official/vision/modeling/layers/detection_generator.py.
Task Controller: RetinaNetTask in official/vision/tasks/retinanet.py orchestrates the training loop, data loading, loss computation (focal + Huber), and metric tracking.

Building a RetinaNet Model

Start by defining a configuration object and invoking the factory. The build_retinanet function (lines 60-76 in official/vision/modeling/factory.py) automatically constructs the backbone, decoder, head, and detection generator.

from official.vision.configs import retinanet as retinanet_cfg
from official.vision.modeling import factory
import tensorflow as tf

# Configure the model

cfg = retinanet_cfg.RetinaNet()
cfg.num_classes = 91  # COCO has 91 categories

cfg.input_size = [640, 640, 3]
cfg.backbone.type = 'resnet'
cfg.backbone.resnet.depth = 50
cfg.head.num_convs = 4
cfg.head.num_filters = 256
cfg.anchor.num_scales = 3
cfg.anchor.aspect_ratios = [0.5, 1.0, 2.0]

# Build the Keras model

input_spec = tf.keras.layers.InputSpec(shape=[None] + cfg.input_size)
model = factory.build_retinanet(input_spec, cfg)

Preparing the Input Pipeline

RetinaNet expects TF-Example records containing bounding boxes and class IDs. The retinanet_input.Parser class in official/vision/dataloaders/retinanet_input.py handles augmentation and anchor matching.

from official.vision.dataloaders import retinanet_input
from official.vision.dataloaders import input_reader_factory

# Initialize parser with same anchor config as the model

parser = retinanet_input.Parser(
    output_size=cfg.input_size[:2],
    min_level=cfg.min_level,
    max_level=cfg.max_level,
    num_scales=cfg.anchor.num_scales,
    aspect_ratios=cfg.anchor.aspect_ratios,
    anchor_size=cfg.anchor.anchor_size,
    dtype='bfloat16',
    match_threshold=0.5,
    unmatched_threshold=0.5,
)

# Create the dataset reader

reader = input_reader_factory.input_reader_generator(
    params=task_cfg.train_data,
    dataset_fn=dataset_fn.pick_dataset_fn('tfrecord'),
    decoder_fn=decoder.decode,
    combine_fn=input_reader.create_combine_fn(task_cfg.train_data),
    parser_fn=parser.parse_fn(is_training=True)
)
train_dataset = reader.read()

The parser and reader instantiation logic mirrors the implementation in RetinaNetTask.build_inputs (lines 20-50 in official/vision/tasks/retinanet.py).

Training Implementation

The RetinaNetTask class manages the training loop, aggregating focal loss for classification and Huber loss for box regression. Loss functions are defined in official/vision/losses and wired together in task.build_losses (lines 21-28 in tasks/retinanet.py).

from official.vision.tasks import retinanet as retinanet_task

# Initialize task and model

task = retinanet_task.RetinaNetTask(task_cfg)
task.initialize(model)  # Loads pretrained backbone if configured

# Build optimizer

optimizer = tf.keras.optimizers.SGD(
    learning_rate=0.32 * task_cfg.train_data.global_batch_size / 256.0,
    momentum=0.9
)

# Training step reuses the task's logic

@tf.function
def train_step(batch):
    return task.train_step(batch, model, optimizer, metrics=task.build_metrics())

# Run training

for epoch in range(12):
    for batch in train_dataset:
        logs = train_step(batch)

Running Inference

For inference, the model accepts images and returns post-processed detections including NMS. The forward pass in RetinaNetModel.call (lines 84-115 in retinanet_model.py) handles anchor generation, head inference, and detection generation.


# Build model for inference (optionally pass precomputed anchors)

model = factory.build_retinanet(input_spec, cfg)

# Run inference on a batch of images [batch, H, W, 3]

outputs = model(images, training=False)

# Extract results

boxes = outputs['detection_boxes']         # [batch, max_detections, 4]

scores = outputs['detection_scores']       # [batch, max_detections]

classes = outputs['detection_classes']     # [batch, max_detections]

num_detections = outputs['num_detections'] # [batch]

Exporting to SavedModel and TFLite

For production deployment, configure ExportConfig flags such as output_normalized_coordinates=True and output_intermediate_features=False in your config. The model supports TFLite-compatible post-processing ops injected during factory construction (lines 33-40 in factory.py).

export_dir = '/tmp/retinanet_savedmodel'
model.save(export_dir, include_optimizer=False, signatures=None)

Summary

RetinaNet in TensorFlow Models is assembled via factory.build_retinanet using configuration dataclasses defined in official/vision/configs/retinanet.py.
The architecture combines a backbone, FPN decoder, dual-purpose head, anchor generator, and detection generator into a single tf.keras.Model.
Training is managed by RetinaNetTask, which automatically handles focal loss, Huber loss, and dataset parsing via retinanet_input.Parser.
Inference performs automatic anchor generation, box decoding, and NMS, returning ready-to-use detection boxes, scores, and class IDs.
The implementation supports export to standard SavedModel and optimized TFLite formats for edge deployment.

Frequently Asked Questions

How does RetinaNet differ from Faster R-CNN in the TensorFlow Models repository?

RetinaNet is a one-stage detector that processes dense anchor boxes across pyramid levels in a single forward pass, while Faster R-CNN is a two-stage detector requiring a separate Region Proposal Network. According to the source code in official/vision/modeling/retinanet_model.py, RetinaNet uses the MultilevelDetectionGenerator for post-processing rather than the RPN-based proposal mechanism found in Faster R-CNN implementations.

Where is the focal loss implemented for RetinaNet training?

The focal loss implementation resides in official/vision/losses. The RetinaNetTask class aggregates this with Huber box regression loss in its build_losses method (lines 21-28 in official/vision/tasks/retinanet.py). The task computes these losses automatically inside train_step, requiring no manual loss configuration in model.compile().

Can I modify anchor scales and aspect ratios without changing the source code?

Yes. Anchor parameters are controlled through the configuration dataclass in official/vision/configs/retinanet.py. Adjust cfg.anchor.num_scales, cfg.anchor.aspect_ratios, and cfg.anchor.anchor_size before passing the config to factory.build_retinanet. The Anchor class in official/vision/ops/anchor.py consumes these parameters to generate multiscale anchors for each FPN level.

What backbones are supported for RetinaNet in this implementation?

The factory supports multiple backbones including ResNet, SpineNet, and MobileNet. In official/vision/modeling/factory.py, the build_backbone function instantiates the backbone based on cfg.backbone.type. You can configure the depth (e.g., ResNet-50 vs ResNet-101) via cfg.backbone.resnet.depth in your configuration object.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how tensorflow/models works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →