# How to Implement Video Classification Models with TensorFlow Models

> Implement video classification models using TensorFlow Models official/vision stack. Train on datasets like Kinetics-400 easily with 3D backbones and config-driven training.

- Repository: [tensorflow/models](https://github.com/tensorflow/models)
- Tags: tutorial
- Published: 2026-02-28

---

**To implement video classification models with TensorFlow Models, use the modular config-driven stack in `official/vision` that wires together experiment configurations, 3D backbones, and task logic to train on video datasets like Kinetics-400 without modifying core training code.**

The TensorFlow Models repository provides a production-grade framework for building video classification models using a declarative configuration system. By leveraging the `official/vision` components, you can assemble complete training pipelines—from data ingestion to model deployment—by editing Python dataclasses rather than imperative code.

## Configuration-Driven Architecture

The implementation follows a strict **config → data → model → task** pattern. All hyperparameters are centralized in [`official/vision/configs/video_classification.py`](https://github.com/tensorflow/models/blob/main/official/vision/configs/video_classification.py), which defines `DataConfig` for datasets like Kinetics or UCF-101, `VideoClassificationModel` for architecture selection, and experiment factories (e.g., `video_classification_kinetics400`) that assemble complete `ExperimentConfig` objects with trainer and optimizer settings.

### Experiment Factory Pattern

Rather than manually instantiating classes, you retrieve pre-built configurations via `exp_factory.get_exp_config`. This factory, located in [`official/core/exp_factory.py`](https://github.com/tensorflow/models/blob/main/official/core/exp_factory.py), returns a complete experiment specification including batch sizes, learning rate schedules, and warm-up steps configured through the `add_trainer` helper in the config file.

## Data Pipeline and Preprocessing

The input pipeline is handled by [`official/vision/dataloaders/video_input.py`](https://github.com/tensorflow/models/blob/main/official/vision/dataloaders/video_input.py), which parses TFRecord or TFDS video examples and applies temporal sampling strategies. The parser performs frame decoding, random augmentations (crop, flip, rotation, AutoAugment/RandAugment), and returns a dictionary with shape `{'image': <tensor>}` matching the `DataConfig.feature_shape` (typically `(batch, T, H, W, C)`).

### Input Reader Construction

The [`official/vision/dataloaders/input_reader_factory.py`](https://github.com/tensorflow/models/blob/main/official/vision/dataloaders/input_reader_factory.py) module constructs the final `tf.data` pipeline using the video parser, handling distributed reading and batching across TPU or GPU workers.

## Model Definition and 3D Backbones

The `VideoClassificationModel` class in [`official/vision/modeling/video_classification_model.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/video_classification_model.py) serves as a thin wrapper around 3D convolutional backbones. It aggregates features from the backbone (such as ResNet-3D or SlowFast defined in [`official/vision/configs/backbones_3d.py`](https://github.com/tensorflow/models/blob/main/official/vision/configs/backbones_3d.py)), applies global pooling, optional dropout, and projects to `num_classes` via a dense layer.

### Backbone Selection

You specify the backbone architecture through the configuration's `backbone.type` field. The [`official/vision/modeling/factory_3d.py`](https://github.com/tensorflow/models/blob/main/official/vision/modeling/factory_3d.py) module dispatches to the appropriate builder based on this type string, supporting architectures like SlowFast without code changes.

## Task Logic and Training Loop

`VideoClassificationTask` in [`official/vision/tasks/video_classification.py`](https://github.com/tensorflow/models/blob/main/official/vision/tasks/video_classification.py) orchestrates the entire training process. It builds the Keras model via `factory_3d.build_model`, loads optional pretrained checkpoints, constructs the input pipeline, and defines the loss function (categorical or binary cross-entropy) and metrics (top-1/top-5 accuracy, AUC, per-class recall).

The task implementation handles mixed-precision training automatically and defines the forward pass logic for both training and validation steps.

## End-to-End Implementation Example

The following code demonstrates how to assemble a complete video classification training setup using the factory pattern:

```python

# 1️⃣ Load the experiment configuration for Kinetics‑400.

from official.core import exp_factory
exp_cfg = exp_factory.get_exp_config('video_classification_kinetics400')

# 2️⃣ Optionally override hyper‑parameters.

exp_cfg.trainer = exp_cfg.trainer.replace(
    steps_per_loop=1000,        # custom step granularity

    optimizer_config=exp_cfg.trainer.optimizer_config.replace(
        optimizer={'type': 'adam'},   # switch optimizer

    )
)

# 3️⃣ Build the model (the task will construct it internally).

from official.vision.tasks import video_classification as video_task
task = video_task.VideoClassificationTask(task_config=exp_cfg.task)

model = task.build_model()   # ↳ builds ResNet‑3D backbone + classification head

# 4️⃣ Inspect the model summary.

model.summary()

# 5️⃣ (Optional) Compile and run a quick sanity‑check fit.

model.compile(
    optimizer='sgd',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Dummy data – shape matches DataConfig.feature_shape: (batch, T, H, W, C)

import tensorflow as tf
dummy_x = tf.random.uniform([4, 64, 224, 224, 3])
dummy_y = tf.one_hot(tf.random.uniform([4], maxval=400, dtype=tf.int32), 400)
model.fit(dummy_x, dummy_y, epochs=1)

```

## Running Distributed Training

Once configured, pass the experiment configuration to the official training script [`model_main_tf2.py`](https://github.com/tensorflow/models/blob/main/model_main_tf2.py). This script creates a `Trainer` instance (as specified by the config's `trainer` field) and executes distributed training on TPUs or GPUs using the strategy defined in the runtime configuration.

## Summary

- **Use `exp_factory.get_exp_config`** to retrieve pre-built experiment configurations for standard datasets like Kinetics-400 or UCF-101.
- **Modify [`official/vision/configs/video_classification.py`](https://github.com/tensorflow/models/blob/main/official/vision/configs/video_classification.py)** to change hyperparameters, backbones (ResNet-3D, SlowFast), or augmentation strategies without touching training logic.
- **Leverage `VideoClassificationTask`** to handle model construction, checkpoint loading, loss computation, and metric tracking in a single class.
- **Process video data** through [`official/vision/dataloaders/video_input.py`](https://github.com/tensorflow/models/blob/main/official/vision/dataloaders/video_input.py), which handles temporal sampling, decoding, and augmentations for TFRecord or TFDS sources.
- **Scale to distributed hardware** by passing the experiment config to [`model_main_tf2.py`](https://github.com/tensorflow/models/blob/main/model_main_tf2.py), which automatically configures the `Trainer` for TPU or GPU clusters.

## Frequently Asked Questions

### How do I switch from ResNet-3D to SlowFast backbone?

Update the `backbone.type` field in your experiment configuration to `'slowfast'` before building the task. The [`factory_3d.py`](https://github.com/tensorflow/models/blob/main/factory_3d.py) module dispatches to the appropriate builder based on this string, and `VideoClassificationModel` automatically adjusts its pooling and projection layers to match the new backbone output shapes.

### Can I use custom video datasets instead of Kinetics-400?

Yes. Create a new configuration function in [`official/vision/configs/video_classification.py`](https://github.com/tensorflow/models/blob/main/official/vision/configs/video_classification.py) that returns an `ExperimentConfig` with your custom `DataConfig` pointing to your TFRecord files. Adjust `feature_shape` and `num_classes` to match your video resolution and label space, then call `exp_factory.get_exp_config` with your new factory name.

### Where is the mixed-precision training logic implemented?

Mixed-precision handling is built into `VideoClassificationTask` in [`official/vision/tasks/video_classification.py`](https://github.com/tensorflow/models/blob/main/official/vision/tasks/video_classification.py). The task automatically applies the appropriate policy during the training and validation step definitions, and the trainer configuration in [`official/vision/configs/video_classification.py`](https://github.com/tensorflow/models/blob/main/official/vision/configs/video_classification.py) controls the precision mode via optimizer settings.

### How do I add custom augmentations to the video pipeline?

Modify the parser in [`official/vision/dataloaders/video_input.py`](https://github.com/tensorflow/models/blob/main/official/vision/dataloaders/video_input.py) or adjust the `augmentation_type` field in your `DataConfig` (defined in [`official/vision/configs/video_classification.py`](https://github.com/tensorflow/models/blob/main/official/vision/configs/video_classification.py)). The existing implementation supports RandAugment and AutoAugment policies, and you can extend the parser's `process_example` method to inject custom temporal or spatial transformations.