how-to-guide

How to Use SpineNet Backbone for Vision Models: A Complete Implementation Guide

February 28, 2026 tensorflow/models ↗

SpineNet is a scale-permuted backbone architecture available in TensorFlow Model Garden that replaces traditional feature pyramids with a directed acyclic graph of cross-scale feature connections, delivering superior accuracy for object detection and segmentation tasks.

SpineNet backbone for vision models introduces a paradigm shift from monotonic pyramid architectures by implementing learnable scale permutations and feature resampling. Originally proposed by Du et al. (2020) and implemented in the tensorflow/models repository, this architecture repeatedly fuses multi-resolution features through a non-linear directed acyclic graph (DAG). This guide demonstrates how to configure and integrate SpineNet into your computer vision pipelines using the official TensorFlow Model Garden implementation.

Understanding the SpineNet Architecture

Unlike conventional backbones that use a bottom-up pathway followed by a top-down FPN (Feature Pyramid Network), SpineNet employs a scale-permuted network where feature maps at different resolutions are repeatedly resampled and fused.

The architecture processes input through three distinct stages: a stem consisting of 7×7 convolution and max-pooling followed by initial bottleneck blocks at level 2; a scale-permuted body that builds a DAG according to block specifications; and endpoints that unify channel depths via 1×1 convolutions. In official/vision/modeling/backbones/spinenet.py, the SpineNet class implements this flow, returning a dictionary of multi-scale tensors ready for downstream heads such as RetinaNet or Mask-RCNN.

Core Components of the TensorFlow Implementation

Block Specifications and BlockSpec

The network topology is defined by SPINENET_BLOCK_SPECS, a list of tuples in spinenet.py that specify the DAG structure. Each entry follows the format (level, block_fn, (input_offset0, input_offset1), is_output), determining the target resolution level, block type (bottleneck or residual), parent connections, and whether the block serves as an output endpoint.

The BlockSpec class (lines 5–13 in spinenet.py) serves as a lightweight container for these entries, parsing the tuple format into accessible attributes. These specifications determine how features flow through the network, with each block receiving inputs from two parent features at potentially different scales.

Scaling Map for Model Variants

SpineNet provides multiple model variants (49S, 49, 96, 143, 143L, 190) through the SCALING_MAP dictionary defined around lines 64–95. This mapping translates a model_id into hyperparameters including filter-size scale, block repeats, and the alpha parameter that controls intermediate channel dimensions. Higher numbers indicate larger models with increased capacity; for instance, model_id='143' represents a high-capacity variant suitable for demanding detection tasks.

Resampling with Alpha

Central to cross-scale fusion is the _resample_with_alpha method, which standardizes parent features to a common spatial resolution and channel depth. Each parent feature first passes through a 1×1 convolution, then undergoes spatial adjustment—either down-sampling via strided 3×3 convolution or up-sampling via nearest-neighbor interpolation—before channel dimensions are tuned using the alpha scaling factor. This resampling enables the aggregation of semantically rich features across disparate resolutions.

Factory Registration

The backbone integrates with Model Garden's configuration system through build_spinenet, decorated with @factory.register_backbone_builder. This function (lines 46–78) instantiates the SpineNet class using parameters from the scaling map, enabling construction from YAML configs or Python dictionaries without manual parameter specification.

Configuring SpineNet for Vision Tasks

SpineNet exposes configuration through the official hyperparameter system. A typical backbone configuration specifies the model variant, output levels, and regularization:

backbone_config = {
    'type': 'spinenet',
    'spinenet': {
        'model_id': '143',          # Options: 49S, 49, 96, 143, 143L, 190

        'min_level': 3,
        'max_level': 7,
        'stochastic_depth_drop_rate': 0.2,
    }
}

The model_id determines architecture parameters via SCALING_MAP, while min_level and max_level specify which feature pyramid levels (typically 3 through 7) to return for downstream task heads.

Implementation Examples

Building a SpineNet Backbone Manually

For custom training loops or research experiments, instantiate SpineNet directly using the factory builder:

from official.vision.modeling.backbones import spinenet
from official.modeling import hyperparams
import tf_keras

# Define input specifications for 640×640 RGB images

input_spec = tf_keras.layers.InputSpec(shape=[None, 640, 640, 3])

# Configure normalization and activation

norm_act_cfg = hyperparams.Config(
    type='norm_activation',
    activation='relu',
    use_sync_bn=False,
    norm_momentum=0.99,
    norm_epsilon=0.001)

# Configure backbone parameters

backbone_cfg = hyperparams.Config(
    type='spinenet',
    model_id='143',
    min_level=3,
    max_level=7,
    stochastic_depth_drop_rate=0.2)

# Build the model

spinenet_backbone = spinenet.build_spinenet(
    input_specs=input_spec,
    backbone_config=backbone_cfg,
    norm_activation_config=norm_act_cfg)

# Output is a dict with keys '3', '4', '5', '6', '7'

features = spinenet_backbone(input_tensor)

This returns a dictionary mapping level names to feature tensors, which can be fed directly into detection heads in retinanet.py or maskrcnn.py.

Training with Official Experiment Configs

For standard benchmarks, use the Model Garden CLI with pre-configured YAML files:

MODEL_DIR=/tmp/spinenet_retinanet
python -m official.vision.benchmark \
  --mode=train_and_eval \
  --model_dir=${MODEL_DIR} \
  --config_file=official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml

The configuration file specifies SpineNet parameters under the backbone key:

backbone:
  type: spinenet
  spinenet:
    model_id: 96
    min_level: 3
    max_level: 7
    stochastic_depth_drop_rate: 0.2

The training loop automatically invokes factory.build_backbone → spinenet.build_spinenet, requiring no additional Python code for backbone construction.

Integrating into Custom Keras Models

Embed SpineNet as a submodule in custom architectures:

class MyDetector(tf_keras.Model):
    def __init__(self, backbone, num_classes):
        super().__init__()
        self.backbone = backbone
        self.head = tf_keras.layers.Conv2D(256, 3, padding='same', activation='relu')
        self.classifier = tf_keras.layers.Dense(num_classes)

    def call(self, inputs):
        feats = self.backbone(inputs)
        level4 = feats['4']  # Shape: [B, H/16, W/16, C]

        x = self.head(level4)
        logits = self.classifier(tf.reduce_mean(x, axis=[1, 2]))
        return logits

# Instantiate with SpineNet-143

backbone = spinenet.build_spinenet(
    input_specs=tf_keras.layers.InputSpec(shape=[None, None, None, 3]),
    backbone_config=hyperparams.Config(
        type='spinenet', 
        model_id='143',
        min_level=3, 
        max_level=7,
        stochastic_depth_drop_rate=0.2),
    norm_activation_config=norm_act_cfg)

detector = MyDetector(backbone, num_classes=90)

Key Source Files

The SpineNet implementation spans several locations in the TensorFlow Model Garden:

official/vision/modeling/backbones/spinenet.py – Core implementation containing SpineNet class, BlockSpec, SPINENET_BLOCK_SPECS, SCALING_MAP, _resample_with_alpha, and build_spinenet factory function.
official/vision/modeling/backbones/__init__.py – Exports SpineNet and SpineNetMobile symbols.
official/vision/configs/backbones.py – Dataclass definitions for SpineNet configuration objects.
official/vision/configs/experiments/retinanet/*.yaml – Example configurations integrating SpineNet with RetinaNet detectors.
official/vision/modeling/backbones/spinenet_mobile.py – Mobile-optimized variant with identical builder interface.
official/vision/modeling/backbones/spinenet_test.py – Unit tests verifying construction and serialization.

Summary

SpineNet replaces monotonic feature pyramids with a scale-permuted DAG architecture that repeatedly resamples and fuses cross-scale features.
The topology is defined by SPINENET_BLOCK_SPECS and scaled via SCALING_MAP to produce variants from 49S to 190.
Use build_spinenet in official/vision/modeling/backbones/spinenet.py to instantiate the backbone from configuration objects.
Integration requires specifying model_id, min_level/max_level, and normalization parameters through the factory builder or YAML configs.
The backbone returns a dictionary of multi-scale tensors suitable for RetinaNet, Mask-RCNN, or custom detection heads.

Frequently Asked Questions

What distinguishes SpineNet from ResNet-FPN architectures?

SpineNet eliminates the strict bottom-up-then-top-down flow of ResNet-FPN by allowing features to permute across scales throughout the network depth. According to the implementation in spinenet.py, the _resample_with_alpha method enables parents at any level to fuse into child blocks at different resolutions, creating a directed acyclic graph rather than a pyramid. This permits richer multi-scale feature reuse and typically yields higher accuracy on object detection benchmarks.

Which SpineNet model variant should I select?

Select the model_id based on your accuracy and computational budget requirements. The SCALING_MAP in spinenet.py defines variants from lightweight 49S (optimized for mobile) to high-capacity 190 for maximum accuracy. For balanced performance, 96 and 143 are commonly used in production detection pipelines, as demonstrated in the RetinaNet experiment configs.

Can SpineNet serve as a backbone for architectures other than RetinaNet?

Yes, SpineNet functions as a drop-in replacement for any vision backbone that consumes multi-scale features. The build_spinenet factory returns feature dictionaries compatible with Mask-RCNN, Cascade-RCNN, or custom heads. The tensorflow/models repository includes examples for both RetinaNet and Mask-RCNN in official/vision/configs/experiments/.

How does stochastic depth regularization work in SpineNet?

The stochastic_depth_drop_rate parameter applies survival probability regularization during training, randomly dropping residual blocks to prevent overfitting. This is particularly valuable for deeper variants like 143L and 190. According to the source implementation, this rate is passed through the backbone config and applied within the block construction logic to improve generalization without affecting inference speed.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how tensorflow/models works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →