How to Implement Object Detection with RetinaNet Using TensorFlow Models
You can implement object detection with RetinaNet by configuring the RetinaNet dataclass in official/vision/configs/retinanet.py, assembling the model via factory.build_retinanet, and training through the RetinaNetTask class which handles focal loss, anchor generation, and NMS automatically.
RetinaNet is a one-stage dense object detector that combines a backbone-FPN feature pyramid with focal-loss training to achieve high accuracy at real-time speeds. The TensorFlow Models repository provides a modular, production-ready implementation that lets you build, train, and deploy RetinaNet without writing boilerplate code for anchor generation or post-processing. This guide walks through the architecture and provides copy-paste code examples to run RetinaNet on your own dataset.
RetinaNet Architecture Components
The implementation in tensorflow/models follows a modular design where each component is instantiated through factory functions and wired together by the RetinaNetModel class.
- Backbone: Extracts multi-scale feature maps using networks like ResNet-50. Built by
backbones.factory.build_backboneinofficial/vision/modeling/backbones/factory.py. - FPN Decoder: Merges backbone levels into a feature pyramid (P3-P7) via
decoders.factory.build_decoderinofficial/vision/modeling/decoders/factory.py. - RetinaNet Head: Two parallel sub-heads for classification (num_classes × num_anchors scores) and box regression (4 × num_anchors offsets). Implemented in
dense_prediction_heads.RetinaNetHeadatofficial/vision/modeling/heads/dense_prediction_heads.py(lines 7-30). - Anchor Generator: Generates multiscale anchor boxes for each pyramid level. Logic resides in
anchor.Anchorwithinofficial/vision/ops/anchor.py, invoked automatically byRetinaNetModelwhenanchor_boxesare not supplied. - Detection Generator: Performs box decoding, score thresholding, and Non-Maximum Suppression (NMS) via
detection_generator.MultilevelDetectionGeneratorinofficial/vision/modeling/layers/detection_generator.py. - Task Controller:
RetinaNetTaskinofficial/vision/tasks/retinanet.pyorchestrates the training loop, data loading, loss computation (focal + Huber), and metric tracking.
Building a RetinaNet Model
Start by defining a configuration object and invoking the factory. The build_retinanet function (lines 60-76 in official/vision/modeling/factory.py) automatically constructs the backbone, decoder, head, and detection generator.
from official.vision.configs import retinanet as retinanet_cfg
from official.vision.modeling import factory
import tensorflow as tf
# Configure the model
cfg = retinanet_cfg.RetinaNet()
cfg.num_classes = 91 # COCO has 91 categories
cfg.input_size = [640, 640, 3]
cfg.backbone.type = 'resnet'
cfg.backbone.resnet.depth = 50
cfg.head.num_convs = 4
cfg.head.num_filters = 256
cfg.anchor.num_scales = 3
cfg.anchor.aspect_ratios = [0.5, 1.0, 2.0]
# Build the Keras model
input_spec = tf.keras.layers.InputSpec(shape=[None] + cfg.input_size)
model = factory.build_retinanet(input_spec, cfg)
Preparing the Input Pipeline
RetinaNet expects TF-Example records containing bounding boxes and class IDs. The retinanet_input.Parser class in official/vision/dataloaders/retinanet_input.py handles augmentation and anchor matching.
from official.vision.dataloaders import retinanet_input
from official.vision.dataloaders import input_reader_factory
# Initialize parser with same anchor config as the model
parser = retinanet_input.Parser(
output_size=cfg.input_size[:2],
min_level=cfg.min_level,
max_level=cfg.max_level,
num_scales=cfg.anchor.num_scales,
aspect_ratios=cfg.anchor.aspect_ratios,
anchor_size=cfg.anchor.anchor_size,
dtype='bfloat16',
match_threshold=0.5,
unmatched_threshold=0.5,
)
# Create the dataset reader
reader = input_reader_factory.input_reader_generator(
params=task_cfg.train_data,
dataset_fn=dataset_fn.pick_dataset_fn('tfrecord'),
decoder_fn=decoder.decode,
combine_fn=input_reader.create_combine_fn(task_cfg.train_data),
parser_fn=parser.parse_fn(is_training=True)
)
train_dataset = reader.read()
The parser and reader instantiation logic mirrors the implementation in RetinaNetTask.build_inputs (lines 20-50 in official/vision/tasks/retinanet.py).
Training Implementation
The RetinaNetTask class manages the training loop, aggregating focal loss for classification and Huber loss for box regression. Loss functions are defined in official/vision/losses and wired together in task.build_losses (lines 21-28 in tasks/retinanet.py).
from official.vision.tasks import retinanet as retinanet_task
# Initialize task and model
task = retinanet_task.RetinaNetTask(task_cfg)
task.initialize(model) # Loads pretrained backbone if configured
# Build optimizer
optimizer = tf.keras.optimizers.SGD(
learning_rate=0.32 * task_cfg.train_data.global_batch_size / 256.0,
momentum=0.9
)
# Training step reuses the task's logic
@tf.function
def train_step(batch):
return task.train_step(batch, model, optimizer, metrics=task.build_metrics())
# Run training
for epoch in range(12):
for batch in train_dataset:
logs = train_step(batch)
Running Inference
For inference, the model accepts images and returns post-processed detections including NMS. The forward pass in RetinaNetModel.call (lines 84-115 in retinanet_model.py) handles anchor generation, head inference, and detection generation.
# Build model for inference (optionally pass precomputed anchors)
model = factory.build_retinanet(input_spec, cfg)
# Run inference on a batch of images [batch, H, W, 3]
outputs = model(images, training=False)
# Extract results
boxes = outputs['detection_boxes'] # [batch, max_detections, 4]
scores = outputs['detection_scores'] # [batch, max_detections]
classes = outputs['detection_classes'] # [batch, max_detections]
num_detections = outputs['num_detections'] # [batch]
Exporting to SavedModel and TFLite
For production deployment, configure ExportConfig flags such as output_normalized_coordinates=True and output_intermediate_features=False in your config. The model supports TFLite-compatible post-processing ops injected during factory construction (lines 33-40 in factory.py).
export_dir = '/tmp/retinanet_savedmodel'
model.save(export_dir, include_optimizer=False, signatures=None)
Summary
- RetinaNet in TensorFlow Models is assembled via
factory.build_retinanetusing configuration dataclasses defined inofficial/vision/configs/retinanet.py. - The architecture combines a backbone, FPN decoder, dual-purpose head, anchor generator, and detection generator into a single
tf.keras.Model. - Training is managed by
RetinaNetTask, which automatically handles focal loss, Huber loss, and dataset parsing viaretinanet_input.Parser. - Inference performs automatic anchor generation, box decoding, and NMS, returning ready-to-use detection boxes, scores, and class IDs.
- The implementation supports export to standard SavedModel and optimized TFLite formats for edge deployment.
Frequently Asked Questions
How does RetinaNet differ from Faster R-CNN in the TensorFlow Models repository?
RetinaNet is a one-stage detector that processes dense anchor boxes across pyramid levels in a single forward pass, while Faster R-CNN is a two-stage detector requiring a separate Region Proposal Network. According to the source code in official/vision/modeling/retinanet_model.py, RetinaNet uses the MultilevelDetectionGenerator for post-processing rather than the RPN-based proposal mechanism found in Faster R-CNN implementations.
Where is the focal loss implemented for RetinaNet training?
The focal loss implementation resides in official/vision/losses. The RetinaNetTask class aggregates this with Huber box regression loss in its build_losses method (lines 21-28 in official/vision/tasks/retinanet.py). The task computes these losses automatically inside train_step, requiring no manual loss configuration in model.compile().
Can I modify anchor scales and aspect ratios without changing the source code?
Yes. Anchor parameters are controlled through the configuration dataclass in official/vision/configs/retinanet.py. Adjust cfg.anchor.num_scales, cfg.anchor.aspect_ratios, and cfg.anchor.anchor_size before passing the config to factory.build_retinanet. The Anchor class in official/vision/ops/anchor.py consumes these parameters to generate multiscale anchors for each FPN level.
What backbones are supported for RetinaNet in this implementation?
The factory supports multiple backbones including ResNet, SpineNet, and MobileNet. In official/vision/modeling/factory.py, the build_backbone function instantiates the backbone based on cfg.backbone.type. You can configure the depth (e.g., ResNet-50 vs ResNet-101) via cfg.backbone.resnet.depth in your configuration object.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →