# How to Use SpineNet Backbone for Vision Models: A Complete Implementation Guide

> Implement SpineNet backbone for vision models using TensorFlow Model Garden. Discover its superior accuracy for object detection and segmentation with this complete guide.

- Repository: [tensorflow/models](https://github.com/tensorflow/models)
- Tags: how-to-guide
- Published: 2026-02-28

---

**SpineNet is a scale-permuted backbone architecture available in TensorFlow Model Garden that replaces traditional feature pyramids with a directed acyclic graph of cross-scale feature connections, delivering superior accuracy for object detection and segmentation tasks.**

SpineNet backbone for vision models introduces a paradigm shift from monotonic pyramid architectures by implementing learnable scale permutations and feature resampling. Originally proposed by Du et al. (2020) and implemented in the `tensorflow/models` repository, this architecture repeatedly fuses multi-resolution features through a non-linear directed acyclic graph (DAG). This guide demonstrates how to configure and integrate SpineNet into your computer vision pipelines using the official TensorFlow Model Garden implementation.

## Understanding the SpineNet Architecture

Unlike conventional backbones that use a bottom-up pathway followed by a top-down FPN (Feature Pyramid Network), **SpineNet** employs a **scale-permuted network** where feature maps at different resolutions are repeatedly resampled and fused. 

The architecture processes input through three distinct stages: a stem consisting of 7×7 convolution and max-pooling followed by initial bottleneck blocks at level 2; a scale-permuted body that builds a DAG according to block specifications; and endpoints that unify channel depths via 1×1 convolutions. In [`official/vision/modeling/backbones/spinenet.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/backbones/spinenet.py), the `SpineNet` class implements this flow, returning a dictionary of multi-scale tensors ready for downstream heads such as RetinaNet or Mask-RCNN.

## Core Components of the TensorFlow Implementation

### Block Specifications and BlockSpec

The network topology is defined by **`SPINENET_BLOCK_SPECS`**, a list of tuples in [`spinenet.py`](https://github.com/tensorflow/models/blob/main/spinenet.py) that specify the DAG structure. Each entry follows the format `(level, block_fn, (input_offset0, input_offset1), is_output)`, determining the target resolution level, block type (bottleneck or residual), parent connections, and whether the block serves as an output endpoint.

The `BlockSpec` class (lines 5–13 in [`spinenet.py`](https://github.com/tensorflow/models/blob/main/spinenet.py)) serves as a lightweight container for these entries, parsing the tuple format into accessible attributes. These specifications determine how features flow through the network, with each block receiving inputs from two parent features at potentially different scales.

### Scaling Map for Model Variants

SpineNet provides multiple model variants (49S, 49, 96, 143, 143L, 190) through the **`SCALING_MAP`** dictionary defined around lines 64–95. This mapping translates a `model_id` into hyperparameters including filter-size scale, block repeats, and the alpha parameter that controls intermediate channel dimensions. Higher numbers indicate larger models with increased capacity; for instance, `model_id='143'` represents a high-capacity variant suitable for demanding detection tasks.

### Resampling with Alpha

Central to cross-scale fusion is the **`_resample_with_alpha`** method, which standardizes parent features to a common spatial resolution and channel depth. Each parent feature first passes through a 1×1 convolution, then undergoes spatial adjustment—either down-sampling via strided 3×3 convolution or up-sampling via nearest-neighbor interpolation—before channel dimensions are tuned using the alpha scaling factor. This resampling enables the aggregation of semantically rich features across disparate resolutions.

### Factory Registration

The backbone integrates with Model Garden's configuration system through **`build_spinenet`**, decorated with `@factory.register_backbone_builder`. This function (lines 46–78) instantiates the `SpineNet` class using parameters from the scaling map, enabling construction from YAML configs or Python dictionaries without manual parameter specification.

## Configuring SpineNet for Vision Tasks

SpineNet exposes configuration through the official hyperparameter system. A typical backbone configuration specifies the model variant, output levels, and regularization:

```python
backbone_config = {
    'type': 'spinenet',
    'spinenet': {
        'model_id': '143',          # Options: 49S, 49, 96, 143, 143L, 190

        'min_level': 3,
        'max_level': 7,
        'stochastic_depth_drop_rate': 0.2,
    }
}

```

The `model_id` determines architecture parameters via `SCALING_MAP`, while `min_level` and `max_level` specify which feature pyramid levels (typically 3 through 7) to return for downstream task heads.

## Implementation Examples

### Building a SpineNet Backbone Manually

For custom training loops or research experiments, instantiate SpineNet directly using the factory builder:

```python
from official.vision.modeling.backbones import spinenet
from official.modeling import hyperparams
import tf_keras

# Define input specifications for 640×640 RGB images

input_spec = tf_keras.layers.InputSpec(shape=[None, 640, 640, 3])

# Configure normalization and activation

norm_act_cfg = hyperparams.Config(
    type='norm_activation',
    activation='relu',
    use_sync_bn=False,
    norm_momentum=0.99,
    norm_epsilon=0.001)

# Configure backbone parameters

backbone_cfg = hyperparams.Config(
    type='spinenet',
    model_id='143',
    min_level=3,
    max_level=7,
    stochastic_depth_drop_rate=0.2)

# Build the model

spinenet_backbone = spinenet.build_spinenet(
    input_specs=input_spec,
    backbone_config=backbone_cfg,
    norm_activation_config=norm_act_cfg)

# Output is a dict with keys '3', '4', '5', '6', '7'

features = spinenet_backbone(input_tensor)

```

This returns a dictionary mapping level names to feature tensors, which can be fed directly into detection heads in [`retinanet.py`](https://github.com/tensorflow/models/blob/main/retinanet.py) or [`maskrcnn.py`](https://github.com/tensorflow/models/blob/main/maskrcnn.py).

### Training with Official Experiment Configs

For standard benchmarks, use the Model Garden CLI with pre-configured YAML files:

```bash
MODEL_DIR=/tmp/spinenet_retinanet
python -m official.vision.benchmark \
  --mode=train_and_eval \
  --model_dir=${MODEL_DIR} \
  --config_file=official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml

```

The configuration file specifies SpineNet parameters under the `backbone` key:

```yaml
backbone:
  type: spinenet
  spinenet:
    model_id: 96
    min_level: 3
    max_level: 7
    stochastic_depth_drop_rate: 0.2

```

The training loop automatically invokes `factory.build_backbone` → `spinenet.build_spinenet`, requiring no additional Python code for backbone construction.

### Integrating into Custom Keras Models

Embed SpineNet as a submodule in custom architectures:

```python
class MyDetector(tf_keras.Model):
    def __init__(self, backbone, num_classes):
        super().__init__()
        self.backbone = backbone
        self.head = tf_keras.layers.Conv2D(256, 3, padding='same', activation='relu')
        self.classifier = tf_keras.layers.Dense(num_classes)

    def call(self, inputs):
        feats = self.backbone(inputs)
        level4 = feats['4']  # Shape: [B, H/16, W/16, C]

        x = self.head(level4)
        logits = self.classifier(tf.reduce_mean(x, axis=[1, 2]))
        return logits

# Instantiate with SpineNet-143

backbone = spinenet.build_spinenet(
    input_specs=tf_keras.layers.InputSpec(shape=[None, None, None, 3]),
    backbone_config=hyperparams.Config(
        type='spinenet', 
        model_id='143',
        min_level=3, 
        max_level=7,
        stochastic_depth_drop_rate=0.2),
    norm_activation_config=norm_act_cfg)

detector = MyDetector(backbone, num_classes=90)

```

## Key Source Files

The SpineNet implementation spans several locations in the TensorFlow Model Garden:

- **[`official/vision/modeling/backbones/spinenet.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/backbones/spinenet.py)** – Core implementation containing `SpineNet` class, `BlockSpec`, `SPINENET_BLOCK_SPECS`, `SCALING_MAP`, `_resample_with_alpha`, and `build_spinenet` factory function.
- **[`official/vision/modeling/backbones/__init__.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/backbones/__init__.py)** – Exports `SpineNet` and `SpineNetMobile` symbols.
- **[`official/vision/configs/backbones.py`](https://github.com/tensorflow/models/blob/main/official/vision/configs/backbones.py)** – Dataclass definitions for SpineNet configuration objects.
- **`official/vision/configs/experiments/retinanet/*.yaml`** – Example configurations integrating SpineNet with RetinaNet detectors.
- **[`official/vision/modeling/backbones/spinenet_mobile.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/backbones/spinenet_mobile.py)** – Mobile-optimized variant with identical builder interface.
- **[`official/vision/modeling/backbones/spinenet_test.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/backbones/spinenet_test.py)** – Unit tests verifying construction and serialization.

## Summary

- **SpineNet** replaces monotonic feature pyramids with a scale-permuted DAG architecture that repeatedly resamples and fuses cross-scale features.
- The topology is defined by **`SPINENET_BLOCK_SPECS`** and scaled via **`SCALING_MAP`** to produce variants from 49S to 190.
- Use **`build_spinenet`** in [`official/vision/modeling/backbones/spinenet.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/backbones/spinenet.py) to instantiate the backbone from configuration objects.
- Integration requires specifying `model_id`, `min_level`/`max_level`, and normalization parameters through the factory builder or YAML configs.
- The backbone returns a dictionary of multi-scale tensors suitable for RetinaNet, Mask-RCNN, or custom detection heads.

## Frequently Asked Questions

### What distinguishes SpineNet from ResNet-FPN architectures?

**SpineNet eliminates the strict bottom-up-then-top-down flow** of ResNet-FPN by allowing features to permute across scales throughout the network depth. According to the implementation in [`spinenet.py`](https://github.com/tensorflow/models/blob/main/spinenet.py), the `_resample_with_alpha` method enables parents at any level to fuse into child blocks at different resolutions, creating a directed acyclic graph rather than a pyramid. This permits richer multi-scale feature reuse and typically yields higher accuracy on object detection benchmarks.

### Which SpineNet model variant should I select?

Select the **`model_id`** based on your accuracy and computational budget requirements. The `SCALING_MAP` in [`spinenet.py`](https://github.com/tensorflow/models/blob/main/spinenet.py) defines variants from lightweight **49S** (optimized for mobile) to high-capacity **190** for maximum accuracy. For balanced performance, **96** and **143** are commonly used in production detection pipelines, as demonstrated in the RetinaNet experiment configs.

### Can SpineNet serve as a backbone for architectures other than RetinaNet?

**Yes**, SpineNet functions as a drop-in replacement for any vision backbone that consumes multi-scale features. The `build_spinenet` factory returns feature dictionaries compatible with Mask-RCNN, Cascade-RCNN, or custom heads. The `tensorflow/models` repository includes examples for both RetinaNet and Mask-RCNN in `official/vision/configs/experiments/`.

### How does stochastic depth regularization work in SpineNet?

The **`stochastic_depth_drop_rate`** parameter applies survival probability regularization during training, randomly dropping residual blocks to prevent overfitting. This is particularly valuable for deeper variants like 143L and 190. According to the source implementation, this rate is passed through the backbone config and applied within the block construction logic to improve generalization without affecting inference speed.