# How to Implement Image Segmentation Models with TensorFlow Models

> Implement image segmentation models using TensorFlow Models. Build networks, add decoders and heads, and configure losses with the official Vision library for efficient end-to-end training.

- Repository: [tensorflow/models](https://github.com/tensorflow/models)
- Tags: how-to-guide
- Published: 2026-02-28

---

**To implement image segmentation models using the TensorFlow Models repository, assemble a backbone network, optional decoder, and segmentation head via the `SegmentationModel` class, then configure specialized losses and metrics from the official Vision library for end-to-end training.**

The `tensorflow/models` repository provides a production-ready, modular framework for building both semantic and instance segmentation pipelines. By composing reusable components—from ResNet backbones to DeepLab-style fusion heads—you can implement image segmentation models without writing low-level TensorFlow operations from scratch.

## Core Architecture Components

The segmentation stack consists of four interconnected components defined in `official/vision/modeling/`. The **backbone** extracts hierarchical features, the **decoder** optionally upsamples and enriches these features, the **segmentation head** produces per-pixel logits, and an optional **mask-scoring head** refines instance mask quality.

### SegmentationModel Class

Located in [`official/vision/modeling/segmentation_model.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/segmentation_model.py), the `SegmentationModel` class (lines 64-76) acts as the orchestration layer. Its constructor accepts:

- `backbone`: Any `tf_keras.Model` returning a dictionary of feature maps (e.g., `{2: feat2, 3: feat3, ...}`)
- `decoder`: Optional upsampling module (omit to connect backbone directly to head)
- `head`: A `SegmentationHead` instance producing logits
- `mask_scoring_head`: Optional `MaskScoring` layer for instance segmentation

The `call` method executes these stages sequentially, returning a dictionary containing `logits` and optionally `mask_scores`.

### Segmentation Head Implementation

The `SegmentationHead` class in [`official/vision/modeling/heads/segmentation_heads.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/heads/segmentation_heads.py) (lines 91-100) generates pixel-wise classifications. Key configuration parameters include:

- **feature_fusion**: Controls low-level feature integration. Options include `deeplabv3plus`, `pyramid_fusion`, `panoptic_fpn_fusion`, or `None` (lines 334-376).
- **upsample_factor**: Nearest-neighbor upsampling ratio applied after convolutions (lines 60-62).
- **num_convs** and **num_filters**: Defines the convolutional stack depth and width before the final 1×1 classifier (lines 63-88).

The final output layer is a `Conv2D` with `num_classes` filters, optionally followed by `logit_activation` (softmax or sigmoid).

### Mask Scoring for Instance Segmentation

For Mask R-CNN-style architectures, the `MaskScoring` class (same file, lines 24-45) refines mask quality predictions. It applies depth-wise convolutions, resizes features to `fc_input_size`, and passes them through fully-connected layers (lines 97-104) to produce per-class mask confidence scores.

## Loss Functions and Evaluation Metrics

The framework provides specialized utilities for segmentation training and evaluation.

**SegmentationLosses** (in [`official/vision/losses/segmentation_losses.py`](https://github.com/tensorflow/models/blob/main/official/vision/losses/segmentation_losses.py)) combines pixel-wise cross-entropy, focal loss, and Dice loss terms. It accepts `ignore_label` parameters to handle unannotated pixels common in datasets like Cityscapes.

**SegmentationMetrics** (in [`official/vision/evaluation/segmentation_metrics.py`](https://github.com/tensorflow/models/blob/main/official/vision/evaluation/segmentation_metrics.py)) computes **Mean Intersection-over-Union (mIoU)**, per-class IoU, and boundary F-scores. Both utilities automatically handle the model's output dictionary, including optional `mask_scores`.

## Data Pipeline Configuration

The `SegmentationInput` class in [`official/vision/dataloaders/segmentation_input.py`](https://github.com/tensorflow/models/blob/main/official/vision/dataloaders/segmentation_input.py) parses TFRecord datasets containing paired images and uint8 masks. It yields a dictionary:

```python
{
    'inputs': image_tensor,          # [H, W, 3] float32

    'groundtruths': {
        'label': mask_tensor,        # [H, W, 1] int32

    }
}

```

The loader supports on-the-fly augmentation including random flipping, scaling, and cropping during training.

## End-to-End Implementation Example

Below is a complete workflow demonstrating how to implement image segmentation models with a ResNet-50 backbone and DeepLabV3+ fusion:

```python
import tensorflow as tf
import tensorflow.keras as tf_keras
from official.vision.modeling.segmentation_model import SegmentationModel
from official.vision.modeling.heads.segmentation_heads import SegmentationHead
from official.vision.modeling.backbones import resnet
from official.vision.dataloaders.segmentation_input import SegmentationInput
from official.vision.losses.segmentation_losses import SegmentationLosses
from official.vision.evaluation.segmentation_metrics import SegmentationMetrics

# 1️⃣  Build backbone (ResNet-50)

backbone = resnet.ResNet(
    model_id=50,
    output_stride=16,
    include_top=False,
    norm_momentum=0.99,
    norm_epsilon=0.001)

# 2️⃣  Build segmentation head with DeepLabV3+ feature fusion

head = SegmentationHead(
    num_classes=21,               # Pascal VOC classes

    level=4,
    num_convs=2,
    num_filters=256,
    feature_fusion='deeplabv3plus',
    low_level=2,
    low_level_num_filters=48,
    upsample_factor=4,
    use_sync_bn=False,
    norm_momentum=0.99,
    norm_epsilon=0.001)

# 3️⃣  Assemble model (no decoder in this example)

model = SegmentationModel(backbone=backbone, decoder=None, head=head)

# 4️⃣  Configure loss and metrics

losses = SegmentationLosses(
    loss_type='softmax_cross_entropy',
    ignore_label=-1)

metrics = SegmentationMetrics(
    num_classes=21,
    ignore_label=-1)

model.compile(
    optimizer=tf_keras.optimizers.Adam(learning_rate=1e-4),
    loss=losses,
    metrics=[metrics])

# 5️⃣  Create TFRecord data pipeline

train_input = SegmentationInput(
    file_pattern='gs://my-bucket/dataset/train-*-of-*.tfrecord',
    is_training=True,
    batch_size=8,
    input_size=(512, 512))

train_dataset = train_input.make_dataset()

# 6️⃣  Train

model.fit(train_dataset, epochs=50)

```

This example connects a ResNet-50 feature extractor directly to a DeepLabV3+ style head, omitting a separate decoder module. The head upsamples predictions by a factor of 4 to match input resolution.

## Customization Strategies

To adapt the framework for specific research needs, modify these key components:

**Custom Decoders**: Subclass `tf_keras.Model` to process `backbone_features` and return decoder tensors, then pass this instance as the `decoder` argument to `SegmentationModel`.

**Multi-Scale Inference**: Wrap `model.predict` in a `tf.function` that processes image pyramids, resizes resulting logits, and averages predictions across scales.

**Panoptic Segmentation**: Set `feature_fusion='panoptic_fpn_fusion'` and configure `decoder_min_level` and `decoder_max_level` parameters (lines 49-55 in [`segmentation_heads.py`](https://github.com/tensorflow/models/blob/main/segmentation_heads.py)) to control FPN hierarchy integration.

**Custom Loss Functions**: While `SegmentationLosses` supports softmax cross-entropy, focal, and Dice losses, you can pass any callable to `model.compile(loss=my_custom_loss)` for specialized objectives like Lovász-Softmax.

**Mask Quality Estimation**: Instantiate `MaskScoring(num_classes, fc_input_size, ...)` and provide it as `mask_scoring_head` when building `SegmentationModel` for Mask Scoring R-CNN implementations.

## Essential Source Files

When implementing custom segmentation architectures, reference these specific files in the tensorflow/models repository:

- **[`official/vision/modeling/segmentation_model.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/segmentation_model.py)**: Core `SegmentationModel` class tying components together.
- **[`official/vision/modeling/heads/segmentation_heads.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/heads/segmentation_heads.py)**: `SegmentationHead` and `MaskScoring` implementations.
- **[`official/vision/losses/segmentation_losses.py`](https://github.com/tensorflow/models/blob/main/official/vision/losses/segmentation_losses.py)**: Loss computation utilities.
- **[`official/vision/evaluation/segmentation_metrics.py`](https://github.com/tensorflow/models/blob/main/official/vision/evaluation/segmentation_metrics.py)**: mIoU and boundary F-score metrics.
- **[`official/vision/dataloaders/segmentation_input.py`](https://github.com/tensorflow/models/blob/main/official/vision/dataloaders/segmentation_input.py)**: TFRecord parsing and augmentation pipeline.
- **`docs/vision/semantic_segmentation.ipynb`**: End-to-end training notebook for Cityscapes and Pascal VOC.

## Summary

- **Assemble components**: Use `SegmentationModel` to combine backbone, decoder, and head from `official/vision/modeling/`.
- **Configure fusion**: Set `feature_fusion` in `SegmentationHead` to `deeplabv3plus` or `panoptic_fpn_fusion` for multi-scale feature integration.
- **Handle data**: Use `SegmentationInput` to parse TFRecords with image and mask pairs.
- **Evaluate properly**: Employ `SegmentationMetrics` for mIoU calculation and `SegmentationLosses` for pixel-wise classification objectives.
- **Extend functionality**: Add `MaskScoring` heads for instance segmentation or custom decoders for specific architectural requirements.

## Frequently Asked Questions

### What backbone architectures are supported for segmentation in TensorFlow Models?

The framework supports any `tf_keras.Model` following the backbone API convention, returning feature maps as a level-indexed dictionary. Pre-implemented options include ResNet (50/101) and EfficientNet variants located in `official/vision/modeling/backbones/`. You can also inject custom backbones provided they output the expected feature dictionary format.

### How do I handle datasets with ignored or unlabeled pixels?

Pass the `ignore_label` parameter (typically `-1` or `255`) to both `SegmentationLosses` and `SegmentationMetrics` during initialization. These classes automatically mask out these indices when computing cross-entropy losses or IoU metrics, ensuring invalid pixels do not affect gradient updates or evaluation scores.

### Can I use this framework for instance segmentation or only semantic segmentation?

While primarily designed for semantic segmentation, the framework supports instance segmentation through the `MaskScoring` class. Instantiate this head and pass it as `mask_scoring_head` when creating `SegmentationModel`. For full Mask R-CNN implementations, combine this with detection components from the official detection modeling library.

### What is the difference between using a decoder and using feature fusion in the head?

The **decoder** is a separate network module (e.g., U-Net style upsampling) that processes backbone features before they reach the head. **Feature fusion** (configured via `feature_fusion` in `SegmentationHead`) happens inside the head itself, merging low-level backbone features with high-level features using operations like DeepLabV3+ or FPN-style aggregation. You can use both together or independently depending on your architecture requirements.