# How to Fine-Tune BERT Models Using TensorFlow Model Garden: A Complete Guide

> Learn to fine-tune BERT models with TensorFlow Model Garden. Our guide covers the unified training driver for NLP benchmarks like GLUE and SQuAD.

- Repository: [tensorflow/models](https://github.com/tensorflow/models)
- Tags: how-to-guide
- Published: 2026-02-28

---

**TensorFlow Model Garden provides a unified training driver ([`official/nlp/train.py`](https://github.com/tensorflow/models/blob/main/official/nlp/train.py)) that orchestrates experiment configurations, model architectures, and task-specific settings to fine-tune pre-trained BERT encoders on downstream NLP benchmarks including GLUE and SQuAD.**

The `tensorflow/models` repository contains a comprehensive NLP framework that simplifies BERT fine-tuning through declarative YAML configurations and a centralized training script. By leveraging the Model Garden's experiment factory pattern, you can fine-tune BERT models using Model Garden on classification, question answering, and retrieval tasks without modifying source code.

## Understanding the Model Garden Training Architecture

The training system relies on three interconnected configuration layers that define how to fine-tune BERT models using Model Garden for specific downstream tasks.

### Experiment Registry and Task Selection

The `exp_factory` module registers predefined experiment types that map task names to their implementations. According to the source code in [`official/nlp/configs/finetuning_experiments.py`](https://github.com/tensorflow/models/blob/main/official/nlp/configs/finetuning_experiments.py), available experiments include `bert/sentence_prediction` for GLUE tasks and `bert/squad` for question answering. The training driver selects the appropriate task logic based on the `--experiment` flag passed to [`official/nlp/train.py`](https://github.com/tensorflow/models/blob/main/official/nlp/train.py).

### Model Configuration via YAML

BERT architecture parameters are defined in YAML files such as [`configs/models/bert_en_uncased_base.yaml`](https://github.com/tensorflow/models/blob/main/configs/models/bert_en_uncased_base.yaml). These configurations specify hidden sizes, number of transformer layers, attention heads, and dropout rates. The `BertEncoder` class in [`official/nlp/modeling/networks/bert_encoder.py`](https://github.com/tensorflow/models/blob/main/official/nlp/modeling/networks/bert_encoder.py) instantiates these values to construct the encoder network used during fine-tuning.

### Task-Specific Experiment Configurations

Each downstream task requires a dedicated experiment configuration file. For example, [`configs/experiments/glue_mnli_matched.yaml`](https://github.com/tensorflow/models/blob/main/configs/experiments/glue_mnli_matched.yaml) defines data paths, evaluation metrics, and initialization checkpoints for GLUE-MNLI, while [`configs/experiments/squad_v1.1.yaml`](https://github.com/tensorflow/models/blob/main/configs/experiments/squad_v1.1.yaml) configures the SQuAD reading comprehension task. These files reference pre-trained checkpoints or TensorFlow Hub URLs via parameters like `task.hub_module_url` or `task.init_checkpoint`.

## Preparing Your Dataset for BERT Fine-Tuning

Before launching training, you must convert raw text data into TF-Record format optimized for BERT input pipelines.

### Generating TF-Records with create_finetuning_data.py

The script [`official/nlp/data/create_finetuning_data.py`](https://github.com/tensorflow/models/blob/main/official/nlp/data/create_finetuning_data.py) handles data preprocessing for classification, SQuAD, retrieval, and tagging tasks. It tokenizes input text using the specified vocabulary file, creates training and evaluation splits, and writes serialized TF-Record files along with a metadata file containing sequence length and dataset statistics.

```bash
export GLUE_DIR=~/glue
export VOCAB_FILE=~/uncased_L-12_H-768_A-12/vocab.txt
export TASK_NAME=MNLI
export OUTPUT_DATA_DIR=gs://my-bucket/datasets

python3 models/official/nlp/data/create_finetuning_data.py \
  --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
  --vocab_file=${VOCAB_FILE} \
  --train_data_output_path=${OUTPUT_DATA_DIR}/${TASK_NAME}_train.tf_record \
  --eval_data_output_path=${OUTPUT_DATA_DIR}/${TASK_NAME}_eval.tf_record \
  --meta_data_file_path=${OUTPUT_DATA_DIR}/${TASK_NAME}_meta_data \
  --fine_tuning_task_type=classification \
  --classification_task_name=${TASK_NAME} \
  --max_seq_length=128

```

## Executing the Fine-Tuning Training Driver

The unified training driver [`official/nlp/train.py`](https://github.com/tensorflow/models/blob/main/official/nlp/train.py) parses combined configurations, sets up distribution strategies (GPU, TPU, or MirroredStrategy), applies mixed-precision training when requested, and launches the training loop via `train_lib.run_experiment`.

### Fine-Tuning BERT on GLUE Classification Tasks

To fine-tune BERT on sentence prediction tasks like MNLI using GPU with mirrored strategy:

```bash
PARAMS=runtime.distribution_strategy=mirrored
PARAMS=${PARAMS},task.train_data.input_path=gs://my-bucket/datasets/MNLI_train.tf_record
PARAMS=${PARAMS},task.validation_data.input_path=gs://my-bucket/datasets/MNLI_eval.tf_record
PARAMS=${PARAMS},task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4

python3 models/official/nlp/train.py \
  --experiment=bert/sentence_prediction \
  --mode=train_and_eval \
  --model_dir=gs://my-bucket/bert_mnli_ckpt \
  --config_file=configs/models/bert_en_uncased_base.yaml \
  --config_file=configs/experiments/glue_mnli_matched.yaml \
  --params_override=${PARAMS}

```

### Fine-Tuning BERT on SQuAD with TPU Acceleration

For question answering tasks on TPU with pre-emption recovery support:

```bash
PARAMS=runtime.distribution_strategy=tpu
PARAMS=${PARAMS},task.train_data.input_path=gs://my-bucket/squad/train.tf_record
PARAMS=${PARAMS},task.validation_data.input_path=~/squad/dev-v1.1.json
PARAMS=${PARAMS},task.validation_data.vocab_file=~/uncased_L-12_H-768_A-12/vocab.txt
PARAMS=${PARAMS},task.init_checkpoint=~/uncased_L-12_H-768_A-12/bert_model.ckpt

python3 models/official/nlp/train.py \
  --experiment=bert/squad \
  --mode=train_and_eval \
  --model_dir=gs://my-bucket/bert_squad_ckpt \
  --config_file=configs/models/bert_en_uncased_base.yaml \
  --config_file=configs/experiments/squad_v1.1.yaml \
  --tpu=$TPU_NAME \
  --params_override=${PARAMS}

```

### Configuration Override Parameters

The `--params_override` flag accepts comma-separated key-value pairs that modify YAML configurations at runtime. Common overrides include `runtime.distribution_strategy` (mirrored/tpu), `task.train_data.input_path` for training data locations, and `task.init_checkpoint` for loading pre-trained weights from local checkpoints rather than TensorFlow Hub.

## Key Implementation Files in the Repository

Understanding the source structure helps debug and extend fine-tuning workflows:

- [`official/nlp/train.py`](https://github.com/tensorflow/models/blob/main/official/nlp/train.py) - Main training driver that orchestrates distribution strategies and experiment execution
- [`official/nlp/modeling/networks/bert_encoder.py`](https://github.com/tensorflow/models/blob/main/official/nlp/modeling/networks/bert_encoder.py) - Contains the `BertEncoder` class that implements the transformer architecture
- [`official/nlp/configs/finetuning_experiments.py`](https://github.com/tensorflow/models/blob/main/official/nlp/configs/finetuning_experiments.py) - Registers experiment types like `bert/sentence_prediction` and `bert/squad`
- [`official/nlp/data/create_finetuning_data.py`](https://github.com/tensorflow/models/blob/main/official/nlp/data/create_finetuning_data.py) - Preprocessing utility for generating TF-Record datasets
- [`official/nlp/configs/models/bert_en_uncased_base.yaml`](https://github.com/tensorflow/models/blob/main/official/nlp/configs/models/bert_en_uncased_base.yaml) - Base BERT architecture configuration
- [`official/nlp/configs/experiments/glue_mnli_matched.yaml`](https://github.com/tensorflow/models/blob/main/official/nlp/configs/experiments/glue_mnli_matched.yaml) - GLUE task-specific settings
- [`official/nlp/configs/experiments/squad_v1.1.yaml`](https://github.com/tensorflow/models/blob/main/official/nlp/configs/experiments/squad_v1.1.yaml) - SQuAD task-specific settings

## Summary

- The unified training driver [`official/nlp/train.py`](https://github.com/tensorflow/models/blob/main/official/nlp/train.py) enables fine-tuning BERT models using Model Garden through declarative experiment configurations.
- The system separates concerns into model configs (architecture), experiment configs (task settings), and runtime overrides (distribution strategy, paths).
- Data preparation requires converting raw text to TF-Records using [`official/nlp/data/create_finetuning_data.py`](https://github.com/tensorflow/models/blob/main/official/nlp/data/create_finetuning_data.py) before training.
- The `exp_factory` registry supports multiple task types including `bert/sentence_prediction` for GLUE and `bert/squad` for question answering.
- Training supports GPU (MirroredStrategy), TPU, and mixed-precision training with built-in pre-emption recovery for long-running jobs.

## Frequently Asked Questions

### What is the difference between model configs and experiment configs in Model Garden?

Model configs (like [`bert_en_uncased_base.yaml`](https://github.com/tensorflow/models/blob/main/bert_en_uncased_base.yaml)) define the neural network architecture parameters including hidden size, number of layers, and dropout rates. Experiment configs (like [`glue_mnli_matched.yaml`](https://github.com/tensorflow/models/blob/main/glue_mnli_matched.yaml)) specify task-specific settings including data paths, evaluation metrics, and initialization checkpoints. The training driver merges these configurations at runtime to create the complete training graph.

### Can I fine-tune BERT on custom datasets using Model Garden?

Yes. First, preprocess your data using [`official/nlp/data/create_finetuning_data.py`](https://github.com/tensorflow/models/blob/main/official/nlp/data/create_finetuning_data.py) with `--fine_tuning_task_type=classification` (or appropriate task type) to generate TF-Records. Then create a custom YAML experiment configuration pointing to your data paths, or use `--params_override` to specify `task.train_data.input_path` and `task.validation_data.input_path` when launching [`official/nlp/train.py`](https://github.com/tensorflow/models/blob/main/official/nlp/train.py).

### How do I resume training after a TPU pre-emption?

The training driver in [`official/nlp/train.py`](https://github.com/tensorflow/models/blob/main/official/nlp/train.py) automatically supports pre-emption recovery on TPUs by periodically saving checkpoints to the directory specified by `--model_dir`. When restarting the job with the same `--model_dir` and `--tpu` flags, the driver detects the latest checkpoint and resumes training from that step without manual intervention.

### Should I use TensorFlow Hub URLs or local checkpoints for initialization?

Both methods are supported. Use `task.hub_module_url` (e.g., `https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4`) to download pre-trained weights automatically, or specify `task.init_checkpoint` with a local path to a BERT checkpoint file (e.g., `~/bert_model.ckpt`) for offline or custom pre-trained models. The local checkpoint approach is preferred when working in air-gapped environments or with custom pre-training outputs.