how-to-guide

How to Fine-Tune BERT Models Using TensorFlow Model Garden: A Complete Guide

February 28, 2026 tensorflow/models ↗

TensorFlow Model Garden provides a unified training driver (official/nlp/train.py) that orchestrates experiment configurations, model architectures, and task-specific settings to fine-tune pre-trained BERT encoders on downstream NLP benchmarks including GLUE and SQuAD.

The tensorflow/models repository contains a comprehensive NLP framework that simplifies BERT fine-tuning through declarative YAML configurations and a centralized training script. By leveraging the Model Garden's experiment factory pattern, you can fine-tune BERT models using Model Garden on classification, question answering, and retrieval tasks without modifying source code.

Understanding the Model Garden Training Architecture

The training system relies on three interconnected configuration layers that define how to fine-tune BERT models using Model Garden for specific downstream tasks.

Experiment Registry and Task Selection

The exp_factory module registers predefined experiment types that map task names to their implementations. According to the source code in official/nlp/configs/finetuning_experiments.py, available experiments include bert/sentence_prediction for GLUE tasks and bert/squad for question answering. The training driver selects the appropriate task logic based on the --experiment flag passed to official/nlp/train.py.

Model Configuration via YAML

BERT architecture parameters are defined in YAML files such as configs/models/bert_en_uncased_base.yaml. These configurations specify hidden sizes, number of transformer layers, attention heads, and dropout rates. The BertEncoder class in official/nlp/modeling/networks/bert_encoder.py instantiates these values to construct the encoder network used during fine-tuning.

Task-Specific Experiment Configurations

Each downstream task requires a dedicated experiment configuration file. For example, configs/experiments/glue_mnli_matched.yaml defines data paths, evaluation metrics, and initialization checkpoints for GLUE-MNLI, while configs/experiments/squad_v1.1.yaml configures the SQuAD reading comprehension task. These files reference pre-trained checkpoints or TensorFlow Hub URLs via parameters like task.hub_module_url or task.init_checkpoint.

Preparing Your Dataset for BERT Fine-Tuning

Before launching training, you must convert raw text data into TF-Record format optimized for BERT input pipelines.

Generating TF-Records with create_finetuning_data.py

The script official/nlp/data/create_finetuning_data.py handles data preprocessing for classification, SQuAD, retrieval, and tagging tasks. It tokenizes input text using the specified vocabulary file, creates training and evaluation splits, and writes serialized TF-Record files along with a metadata file containing sequence length and dataset statistics.

export GLUE_DIR=~/glue
export VOCAB_FILE=~/uncased_L-12_H-768_A-12/vocab.txt
export TASK_NAME=MNLI
export OUTPUT_DATA_DIR=gs://my-bucket/datasets

python3 models/official/nlp/data/create_finetuning_data.py \
  --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
  --vocab_file=${VOCAB_FILE} \
  --train_data_output_path=${OUTPUT_DATA_DIR}/${TASK_NAME}_train.tf_record \
  --eval_data_output_path=${OUTPUT_DATA_DIR}/${TASK_NAME}_eval.tf_record \
  --meta_data_file_path=${OUTPUT_DATA_DIR}/${TASK_NAME}_meta_data \
  --fine_tuning_task_type=classification \
  --classification_task_name=${TASK_NAME} \
  --max_seq_length=128

Executing the Fine-Tuning Training Driver

The unified training driver official/nlp/train.py parses combined configurations, sets up distribution strategies (GPU, TPU, or MirroredStrategy), applies mixed-precision training when requested, and launches the training loop via train_lib.run_experiment.

Fine-Tuning BERT on GLUE Classification Tasks

To fine-tune BERT on sentence prediction tasks like MNLI using GPU with mirrored strategy:

PARAMS=runtime.distribution_strategy=mirrored
PARAMS=${PARAMS},task.train_data.input_path=gs://my-bucket/datasets/MNLI_train.tf_record
PARAMS=${PARAMS},task.validation_data.input_path=gs://my-bucket/datasets/MNLI_eval.tf_record
PARAMS=${PARAMS},task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4

python3 models/official/nlp/train.py \
  --experiment=bert/sentence_prediction \
  --mode=train_and_eval \
  --model_dir=gs://my-bucket/bert_mnli_ckpt \
  --config_file=configs/models/bert_en_uncased_base.yaml \
  --config_file=configs/experiments/glue_mnli_matched.yaml \
  --params_override=${PARAMS}

Fine-Tuning BERT on SQuAD with TPU Acceleration

For question answering tasks on TPU with pre-emption recovery support:

PARAMS=runtime.distribution_strategy=tpu
PARAMS=${PARAMS},task.train_data.input_path=gs://my-bucket/squad/train.tf_record
PARAMS=${PARAMS},task.validation_data.input_path=~/squad/dev-v1.1.json
PARAMS=${PARAMS},task.validation_data.vocab_file=~/uncased_L-12_H-768_A-12/vocab.txt
PARAMS=${PARAMS},task.init_checkpoint=~/uncased_L-12_H-768_A-12/bert_model.ckpt

python3 models/official/nlp/train.py \
  --experiment=bert/squad \
  --mode=train_and_eval \
  --model_dir=gs://my-bucket/bert_squad_ckpt \
  --config_file=configs/models/bert_en_uncased_base.yaml \
  --config_file=configs/experiments/squad_v1.1.yaml \
  --tpu=$TPU_NAME \
  --params_override=${PARAMS}

Configuration Override Parameters

The --params_override flag accepts comma-separated key-value pairs that modify YAML configurations at runtime. Common overrides include runtime.distribution_strategy (mirrored/tpu), task.train_data.input_path for training data locations, and task.init_checkpoint for loading pre-trained weights from local checkpoints rather than TensorFlow Hub.

Key Implementation Files in the Repository

Understanding the source structure helps debug and extend fine-tuning workflows:

official/nlp/train.py - Main training driver that orchestrates distribution strategies and experiment execution
official/nlp/modeling/networks/bert_encoder.py - Contains the BertEncoder class that implements the transformer architecture
official/nlp/configs/finetuning_experiments.py - Registers experiment types like bert/sentence_prediction and bert/squad
official/nlp/data/create_finetuning_data.py - Preprocessing utility for generating TF-Record datasets
official/nlp/configs/models/bert_en_uncased_base.yaml - Base BERT architecture configuration
official/nlp/configs/experiments/glue_mnli_matched.yaml - GLUE task-specific settings
official/nlp/configs/experiments/squad_v1.1.yaml - SQuAD task-specific settings

Summary

The unified training driver official/nlp/train.py enables fine-tuning BERT models using Model Garden through declarative experiment configurations.
The system separates concerns into model configs (architecture), experiment configs (task settings), and runtime overrides (distribution strategy, paths).
Data preparation requires converting raw text to TF-Records using official/nlp/data/create_finetuning_data.py before training.
The exp_factory registry supports multiple task types including bert/sentence_prediction for GLUE and bert/squad for question answering.
Training supports GPU (MirroredStrategy), TPU, and mixed-precision training with built-in pre-emption recovery for long-running jobs.

Frequently Asked Questions

What is the difference between model configs and experiment configs in Model Garden?

Model configs (like bert_en_uncased_base.yaml) define the neural network architecture parameters including hidden size, number of layers, and dropout rates. Experiment configs (like glue_mnli_matched.yaml) specify task-specific settings including data paths, evaluation metrics, and initialization checkpoints. The training driver merges these configurations at runtime to create the complete training graph.

Can I fine-tune BERT on custom datasets using Model Garden?

Yes. First, preprocess your data using official/nlp/data/create_finetuning_data.py with --fine_tuning_task_type=classification (or appropriate task type) to generate TF-Records. Then create a custom YAML experiment configuration pointing to your data paths, or use --params_override to specify task.train_data.input_path and task.validation_data.input_path when launching official/nlp/train.py.

How do I resume training after a TPU pre-emption?

The training driver in official/nlp/train.py automatically supports pre-emption recovery on TPUs by periodically saving checkpoints to the directory specified by --model_dir. When restarting the job with the same --model_dir and --tpu flags, the driver detects the latest checkpoint and resumes training from that step without manual intervention.

Should I use TensorFlow Hub URLs or local checkpoints for initialization?

Both methods are supported. Use task.hub_module_url (e.g., https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4) to download pre-trained weights automatically, or specify task.init_checkpoint with a local path to a BERT checkpoint file (e.g., ~/bert_model.ckpt) for offline or custom pre-trained models. The local checkpoint approach is preferred when working in air-gapped environments or with custom pre-training outputs.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how tensorflow/models works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →