How to Use NLP Encoders and Pre-trained Models in TensorFlow Models

The TensorFlow Models repository provides a unified build_encoder factory in official/nlp/configs/encoders.py that constructs any transformer-based encoder (BERT, ALBERT, BigBird, etc.) from a configuration object, with pre-trained weights loadable via TF-Hub or converted checkpoints.

The tensorflow/models official NLP package simplifies working with NLP encoders and pre-trained models through a consistent configuration-driven API. Whether you need BERT for classification or BigBird for long-document processing, the repository offers a single entry point to construct, load, and fine-tune transformer architectures without rewriting boilerplate instantiation code.

Configuring Encoder Architectures

All supported encoders are defined through encoder configuration dataclasses located in official/nlp/configs/encoders.py. Each architecture has its own dataclass—such as BertEncoderConfig, AlbertEncoderConfig, or BigBirdEncoderConfig—that exposes hyperparameters including hidden_size, num_layers, num_attention_heads, and dropout_rate.

To select an encoder type, wrap the specific config inside the EncoderConfig (OneOfConfig) wrapper:

from official.nlp.configs import encoders
from official.nlp.configs.encoders import EncoderConfig

my_cfg = EncoderConfig(
    type="bigbird",
    bigbird=encoders.BigBirdEncoderConfig(
        hidden_size=1024,
        num_layers=12,
        num_attention_heads=16,
        max_position_embeddings=4096,
        dropout_rate=0.1,
        norm_first=True,
    )
)

The type field determines which sub-config the factory reads. All other sub-configs are ignored, keeping the API simple while exposing every encoder’s full parameter set.

Building Encoders with the Factory Pattern

The build_encoder function serves as the Gin-configurable factory that transforms configuration objects into ready-to-use tf.keras.layers.Layer instances:

from official.nlp.configs.encoders import build_encoder

encoder = build_encoder(my_cfg)

When invoked, build_encoder performs three critical operations according to the source code in official/nlp/configs/encoders.py:

  • Resolves the chosen encoder class (e.g., BigBirdEncoder) from official.nlp.modeling.networks
  • Builds an embedding layer automatically or reuses one passed via the embedding_layer= argument
  • Wires encoder-specific attention and mask objects (such as layers.BigBirdAttention for BigBird)

The returned layer produces a dictionary of outputs containing sequence_output and pooled_output, compatible with downstream task heads.

Loading Pre-trained Weights

The repository supports two primary methods for loading pre-trained weights into constructed encoders: TensorFlow Hub modules and converted legacy checkpoints.

Loading from TensorFlow Hub

For models available on TF-Hub (such as BERT-base), use the get_encoder_from_hub utility in official/nlp/tasks/utils.py:

from official.nlp.tasks import utils as task_utils

hub_path = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4"
hub_encoder = task_utils.get_encoder_from_hub(hub_path)

This function constructs the three required input placeholders (input_word_ids, input_mask, input_type_ids), feeds them to a Hub KerasLayer, and returns a tf.keras.Model whose output dictionary format matches native encoders.

Restoring from Legacy Checkpoints

When working with original TensorFlow 1 checkpoints, use the converter scripts in official/nlp/tools/ to produce TF-2 compatible formats:

python -m official.nlp.tools.tf2_bert_encoder_checkpoint_converter \
    --tf1_checkpoint_path=/tmp/bert_ckpt \
    --tf2_checkpoint_path=/tmp/bert_tf2_ckpt

After conversion, restore weights into your built encoder:

import tensorflow as tf

ckpt = tf.train.Checkpoint(encoder=encoder)
ckpt.restore("/tmp/bert_tf2_ckpt").expect_partial()

Similar converters exist for ALBERT (tf2_albert_encoder_checkpoint_converter.py) and other architectures.

Integrating Encoders into Downstream Tasks

All official NLP tasks accept an encoder_cfg argument, internally calling build_encoder so you rarely need to instantiate the encoder manually. The Sentence Prediction Task demonstrates this pattern:

from official.nlp.tasks import sentence_prediction

model = sentence_prediction.SentencePredictionTask(
    model_config=sentence_prediction.SentencePredictionConfig(
        encoder=my_cfg,
        num_classes=3,
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"],
    )
)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5))

This approach ensures the encoder configuration remains centralized while the task handles input preprocessing, model assembly, and metric computation.

Complete End-to-End Example

The following script demonstrates fine-tuning BERT on a classification task using the configuration-driven API:

import tensorflow as tf
from official.nlp.configs import encoders
from official.nlp.configs.encoders import EncoderConfig, build_encoder
from official.nlp.tasks import sentence_prediction

# 1. Configure BERT-base

cfg = EncoderConfig(
    type="bert",
    bert=encoders.BertEncoderConfig(
        hidden_size=768,
        num_layers=12,
        num_attention_heads=12,
        intermediate_size=3072,
        dropout_rate=0.1,
        max_position_embeddings=512,
    )
)

# 2. Build encoder

encoder = build_encoder(cfg)

# 3. Optional: Restore TF-2 checkpoint

# ckpt = tf.train.Checkpoint(encoder=encoder)

# ckpt.restore("/path/to/bert_tf2_ckpt").expect_partial()

# 4. Assemble downstream task

task_cfg = sentence_prediction.SentencePredictionConfig(
    encoder=cfg,
    num_classes=3,
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

model = sentence_prediction.SentencePredictionTask(model_config=task_cfg)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5))

# 5. Train

train_ds = tf.data.TFRecordDataset("train.tfrecord").batch(32)
val_ds = tf.data.TFRecordDataset("dev.tfrecord").batch(32)
model.fit(train_ds, epochs=3, validation_data=val_ds)

Summary

  • build_encoder in official/nlp/configs/encoders.py is the central factory for constructing transformer encoders from configuration objects
  • EncoderConfig uses a type field to select between architectures (BERT, ALBERT, BigBird) while ignoring unused sub-configs
  • Pre-trained weights load via get_encoder_from_hub for TF-Hub models or checkpoint converters for legacy TF-1 weights
  • Task APIs automatically invoke build_encoder, streamlining the path from configuration to training loop
  • All encoder dataclasses expose full hyperparameter control including hidden sizes, attention heads, and normalization ordering

Frequently Asked Questions

How do I switch between different encoder architectures?

Change the type parameter in EncoderConfig and provide the corresponding sub-config. For example, set type="albert" and populate the albert= field with AlbertEncoderConfig, or use type="bert" with BertEncoderConfig. The factory automatically instantiates the correct network class from official.nlp.modeling.networks based on this selection.

Can I load pre-trained weights from TensorFlow Hub?

Yes. Use task_utils.get_encoder_from_hub(hub_url) from official/nlp/tasks/utils.py to wrap a Hub module. This returns a Keras Model compatible with the task API. When using Hub encoders directly in task configurations, set the encoder field to the Hub model instance rather than a config object.

How do I convert legacy TensorFlow 1 checkpoints for TF2 encoders?

Run the architecture-specific converter scripts located in official/nlp/tools/. For BERT, execute python -m official.nlp.tools.tf2_bert_encoder_checkpoint_converter with --tf1_checkpoint_path and --tf2_checkpoint_path arguments. ALBERT and other encoders have equivalent converters. The output checkpoint restores into TF2 encoder instances using tf.train.Checkpoint.

Should I use build_encoder directly or the Task API?

Use the Task API for standard fine-tuning workflows, as it handles build_encoder invocation, input preprocessing, and model compilation automatically. Call build_encoder directly when you need custom embedding layers, specialized weight restoration logic, or when integrating the encoder into non-standard architectures outside the official task framework.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →