# How to Preprocess Images for Vision Models Using TensorFlow Models

> Learn to preprocess images for vision models with TensorFlow Models. Discover a production-ready pipeline for normalization, geometric transforms, and photometric augmentation.

- Repository: [tensorflow/models](https://github.com/tensorflow/models)
- Tags: how-to-guide
- Published: 2026-02-28

---

**TensorFlow Models provides a modular, production-ready preprocessing pipeline in [`official/vision/ops/preprocess_ops.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/preprocess_ops.py) that normalizes pixel values, applies geometric transforms, and performs photometric augmentation to prepare raw images for computer vision tasks.**

The `tensorflow/models` repository contains a comprehensive stack of image preprocessing utilities designed for classification, object detection, and segmentation workflows. These pure TensorFlow operations execute efficiently within `tf.data` pipelines on CPU, GPU, or TPU hardware, eliminating Python-side bottlenecks during training and inference.

## The Three Stages of Preprocessing

The preprocessing pipeline follows a logical three-stage architecture implemented in [`official/vision/ops/preprocess_ops.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/preprocess_ops.py). Each stage handles distinct aspects of image preparation.

### Normalization

The **normalization** stage converts raw pixel data into a standardized format suitable for neural network input. The `normalize_image` function (located at line 78) first calls `tf.image.convert_image_dtype` to cast images to `float32` and rescale values to `[0, 1]`. It then applies per-channel mean subtraction and standard deviation division using ImageNet-derived statistics: `MEAN_NORM = (0.485, 0.456, 0.406)` and `STDDEV_NORM = (0.229, 0.224, 0.225)`.

### Geometric Transforms

**Geometric transforms** handle spatial operations including resizing, cropping, padding, and flipping. Key functions include `resize_and_crop_image`, `resize_and_crop_image_v2`, `random_crop_image`, and `random_horizontal_flip`. These operations simultaneously update associated annotations—bounding boxes in normalized `[y_min, x_min, y_max, x_max]` format, masks as `[N, H, W]` tensors, and keypoints as `[N, K, 2]` coordinates—to maintain alignment with the transformed image.

### Photometric Augmentation

**Photometric augmentation** improves model robustness through color space manipulations. The `color_jitter` function composes `random_brightness`, `random_contrast`, and `random_saturation` operations, each drawing perturbation factors from uniform distributions. Additionally, `random_jpeg_quality` simulates compression artifacts by re-encoding images at random quality levels between 20 and 100.

## Core Preprocessing Operations

### Input Handling and Resizing

The pipeline accepts rank-3 tensors `[H, W, C]` or raw JPEG bytes for optimized decoding paths. For object detection tasks, `resize_and_crop_image` (RetinaNet style) computes desired sizes while preserving aspect ratio, applies random scale jitter via `aug_scale_min` and `aug_scale_max` parameters, and pads to stride-aligned dimensions. The function returns both the transformed image and an `image_info` tensor recording original dimensions, final size, scaling factors, and crop offsets.

Alternatively, `resize_and_crop_image_v2` (Faster-RCNN style) enforces short-side and long-side constraints before jitter and padding, providing different scaling behavior for two-stage detectors.

### Random Cropping and Flipping

The `random_crop_image` function utilizes `tf.image.sample_distorted_bounding_box` to sample crops respecting user-specified aspect ratio and area ranges. It updates bounding box coordinates through `resize_and_crop_boxes` to reflect the new crop window.

For augmentation, `random_horizontal_flip` and `random_vertical_flip` execute with probability `prob`, simultaneously transforming annotations using `horizontal_flip_boxes`/`vertical_flip_boxes` from `box_ops` and `horizontal_flip_masks`/`vertical_flip_masks` for segmentation masks.

## Implementation Examples by Task

### Classification Preprocessing (ResNet-Style)

This recipe decodes JPEG bytes, applies center cropping, and includes photometric jitter for training:

```python
import tensorflow as tf
from official.vision.ops import preprocess_ops as pp

def preprocess_for_classification(image_bytes, training=True):
    # Decode JPEG (fast v2 path)

    image = tf.image.decode_jpeg(image_bytes, channels=3)

    # Resize to 256×256, then center-crop to 224×224

    image, _ = pp.resize_and_crop_image(
        image,
        desired_size=[256, 256],
        padded_size=[256, 256],
        aug_scale_min=1.0,
        aug_scale_max=1.0,
        centered_crop=True,
    )
    image = pp.center_crop_image(image, center_crop_fraction=0.875)  # 224×224

    # Random horizontal flip for training only

    if training:
        image, = pp.random_horizontal_flip(image, prob=0.5)

    # Photometric jitter (only during training)

    if training:
        image = pp.color_jitter(
            image, brightness=0.2, contrast=0.2, saturation=0.2, seed=1234
        )

    # Normalization

    image = pp.normalize_image(image)          # → float32, mean-std normalized

    return image

```

### Object Detection Preprocessing (Faster-RCNN Style)

This example handles bounding boxes and masks through geometric transforms:

```python
import tensorflow as tf
from official.vision.ops import preprocess_ops as pp
from official.vision.utils import object_detection as od

def preprocess_for_detection(image, boxes, masks=None, training=True):
    # 1️⃣ Resize & pad to stride-aligned size (e.g. 800×1333 for COCO)

    image, image_info = pp.resize_and_crop_image_v2(
        image,
        short_side=800,
        long_side=1333,
        padded_size=[800, 1344],   # stride 32 → next multiple

        aug_scale_min=0.8 if training else 1.0,
        aug_scale_max=1.2 if training else 1.0,
    )

    # 2️⃣ Apply same geometric transform to boxes/masks

    boxes = pp.resize_and_crop_boxes(
        boxes, image_info[2], image_info[1][:2], image_info[3]
    )
    if masks is not None:
        masks = pp.resize_and_crop_masks(
            masks, image_info[2], image_info[1][:2], image_info[3]
        )

    # 3️⃣ Random flip (training only)

    if training:
        image, boxes, masks = pp.random_horizontal_flip(
            image, normalized_boxes=boxes, masks=masks, prob=0.5
        )

    # 4️⃣ Photometric jitter (training only)

    if training:
        image = pp.color_jitter(
            image, brightness=0.1, contrast=0.1, saturation=0.1, seed=42
        )

    # 5️⃣ Normalize (use ImageNet statistics)

    image = pp.normalize_image(image)

    return image, boxes, masks, image_info

```

### Segmentation Preprocessing (DeepLab-Style)

For semantic segmentation, labels require nearest-neighbor resizing to preserve class IDs:

```python
import tensorflow as tf
from official.vision.ops import preprocess_ops as pp
from official.vision.ops import augment

def preprocess_for_segmentation(image, label, training=True):
    # Resize to a multiple of the output stride (e.g., 513 → 512)

    image, image_info = pp.resize_and_crop_image(
        image,
        desired_size=[512, 512],
        padded_size=[512, 512],
        aug_scale_min=0.5 if training else 1.0,
        aug_scale_max=2.0 if training else 1.0,
    )
    # Resize label (nearest-neighbor to keep class IDs)

    label = tf.image.resize(
        label, tf.cast(image_info[1][:2], tf.int32), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR
    )

    # Random flip + color jitter (training only)

    if training:
        image, label = pp.random_horizontal_flip(image, normalized_boxes=None, masks=None, prob=0.5)[0:2]
        image = pp.color_jitter(image, brightness=0.2, contrast=0.2, saturation=0.2)

    # Normalization

    image = pp.normalize_image(image)
    return image, label

```

## Key Source Files and Utilities

The preprocessing ecosystem spans several directories within the repository:

- **[`official/vision/ops/preprocess_ops.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/preprocess_ops.py)** – Core image-level operations including `normalize_image`, `resize_and_crop_image`, `random_horizontal_flip`, and `color_jitter`.
- **[`official/vision/utils/object_detection/preprocessor.py`](https://github.com/tensorflow/models/blob/main/official/vision/utils/object_detection/preprocessor.py)** – High-level orchestration combining geometric transforms with box and mask handling.
- **[`official/vision/ops/augment.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/augment.py)** – Low-level photometric helpers (`brightness`, `contrast`, `saturation`, `blend`) used by the jitter functions.
- **[`official/vision/utils/object_detection/box_list.py`](https://github.com/tensorflow/models/blob/main/official/vision/utils/object_detection/box_list.py)** – Box-list wrapper with utilities like `clip_boxes` and `horizontal_flip_boxes` for annotation manipulation.
- **[`research/object_detection/core/preprocessor.py`](https://github.com/tensorflow/models/blob/main/research/object_detection/core/preprocessor.py)** – Legacy preprocessing implementations used by research configurations.
- **[`research/slim/preprocessing/preprocessing_factory.py`](https://github.com/tensorflow/models/blob/main/research/slim/preprocessing/preprocessing_factory.py)** – Factory mapping string names to concrete functions for SLIM model compatibility.

## Summary

- **Normalization** uses ImageNet statistics `(0.485, 0.456, 0.406)` for mean and `(0.229, 0.224, 0.225)` for standard deviation via `normalize_image` in [`preprocess_ops.py`](https://github.com/tensorflow/models/blob/main/preprocess_ops.py).
- **Geometric transforms** update both images and annotations simultaneously, ensuring bounding boxes and masks remain aligned after flipping or cropping.
- **Two resizing modes** support different detection architectures: `resize_and_crop_image` for single-stage detectors and `resize_and_crop_image_v2` for two-stage Faster-RCNN style models.
- **Pure TensorFlow operations** enable hardware-accelerated preprocessing within `tf.data` pipelines without Python bottlenecks.
- **Modular design** allows chaining primitives for custom classification, detection, or segmentation workflows.

## Frequently Asked Questions

### What is the difference between `resize_and_crop_image` and `resize_and_crop_image_v2`?

`resize_and_crop_image` follows RetinaNet-style preprocessing by computing a desired size while keeping aspect ratio, applying random scale jitter, and padding to stride-aligned dimensions. `resize_and_crop_image_v2` implements Faster-RCNN-style logic by first enforcing a short-side length, then applying a long-side cap, before optional jitter and padding. Both return an `image_info` tensor containing scaling factors and offset coordinates necessary for mapping predictions back to original image coordinates.

### How does TensorFlow Models handle bounding boxes during image flipping?

The `random_horizontal_flip` and `random_vertical_flip` functions accept `normalized_boxes` and `masks` parameters alongside the image tensor. When a flip occurs (based on the `prob` probability), these functions invoke `horizontal_flip_boxes` or `vertical_flip_boxes` from `box_ops` (located in [`official/vision/utils/object_detection/box_list.py`](https://github.com/tensorflow/models/blob/main/official/vision/utils/object_detection/box_list.py)) to transform coordinates. For masks, `horizontal_flip_masks` and `vertical_flip_masks` perform corresponding spatial inversions, ensuring annotations remain synchronized with the augmented image.

### What normalization statistics does the pipeline use by default?

According to the source code in [`official/vision/ops/preprocess_ops.py`](https://github.com/tensorflow/models/blob/main/official/vision/ops/preprocess_ops.py), the default normalization constants follow ImageNet training statistics: `MEAN_NORM = (0.485, 0.456, 0.406)` for the RGB channels and `STDDEV_NORM = (0.229, 0.224, 0.225)` for the corresponding standard deviations. These values match the preprocessing used by PyTorch pretrained models and ensure compatibility when fine-tuning backbones trained on ImageNet.

### Can these preprocessing functions execute inside a `tf.data` pipeline?

Yes, all functions in [`preprocess_ops.py`](https://github.com/tensorflow/models/blob/main/preprocess_ops.py) are implemented using pure TensorFlow operations (no Python-side logic or NumPy dependencies). This design allows seamless integration with `tf.data.Dataset.map()` calls, enabling preprocessing to run in parallel on CPU while the accelerator handles forward/backward passes. The operations support graph execution, XLA compilation, and TPU hardware, making them suitable for large-scale training workflows.