How to Use NLP Encoders and Pre-trained Models in TensorFlow Models
The TensorFlow Models repository provides a unified build_encoder factory in official/nlp/configs/encoders.py that constructs any transformer-based encoder (BERT, ALBERT, BigBird, etc.) from a configuration object, with pre-trained weights loadable via TF-Hub or converted checkpoints.
The tensorflow/models official NLP package simplifies working with NLP encoders and pre-trained models through a consistent configuration-driven API. Whether you need BERT for classification or BigBird for long-document processing, the repository offers a single entry point to construct, load, and fine-tune transformer architectures without rewriting boilerplate instantiation code.
Configuring Encoder Architectures
All supported encoders are defined through encoder configuration dataclasses located in official/nlp/configs/encoders.py. Each architecture has its own dataclass—such as BertEncoderConfig, AlbertEncoderConfig, or BigBirdEncoderConfig—that exposes hyperparameters including hidden_size, num_layers, num_attention_heads, and dropout_rate.
To select an encoder type, wrap the specific config inside the EncoderConfig (OneOfConfig) wrapper:
from official.nlp.configs import encoders
from official.nlp.configs.encoders import EncoderConfig
my_cfg = EncoderConfig(
type="bigbird",
bigbird=encoders.BigBirdEncoderConfig(
hidden_size=1024,
num_layers=12,
num_attention_heads=16,
max_position_embeddings=4096,
dropout_rate=0.1,
norm_first=True,
)
)
The type field determines which sub-config the factory reads. All other sub-configs are ignored, keeping the API simple while exposing every encoder’s full parameter set.
Building Encoders with the Factory Pattern
The build_encoder function serves as the Gin-configurable factory that transforms configuration objects into ready-to-use tf.keras.layers.Layer instances:
from official.nlp.configs.encoders import build_encoder
encoder = build_encoder(my_cfg)
When invoked, build_encoder performs three critical operations according to the source code in official/nlp/configs/encoders.py:
- Resolves the chosen encoder class (e.g.,
BigBirdEncoder) fromofficial.nlp.modeling.networks - Builds an embedding layer automatically or reuses one passed via the
embedding_layer=argument - Wires encoder-specific attention and mask objects (such as
layers.BigBirdAttentionfor BigBird)
The returned layer produces a dictionary of outputs containing sequence_output and pooled_output, compatible with downstream task heads.
Loading Pre-trained Weights
The repository supports two primary methods for loading pre-trained weights into constructed encoders: TensorFlow Hub modules and converted legacy checkpoints.
Loading from TensorFlow Hub
For models available on TF-Hub (such as BERT-base), use the get_encoder_from_hub utility in official/nlp/tasks/utils.py:
from official.nlp.tasks import utils as task_utils
hub_path = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4"
hub_encoder = task_utils.get_encoder_from_hub(hub_path)
This function constructs the three required input placeholders (input_word_ids, input_mask, input_type_ids), feeds them to a Hub KerasLayer, and returns a tf.keras.Model whose output dictionary format matches native encoders.
Restoring from Legacy Checkpoints
When working with original TensorFlow 1 checkpoints, use the converter scripts in official/nlp/tools/ to produce TF-2 compatible formats:
python -m official.nlp.tools.tf2_bert_encoder_checkpoint_converter \
--tf1_checkpoint_path=/tmp/bert_ckpt \
--tf2_checkpoint_path=/tmp/bert_tf2_ckpt
After conversion, restore weights into your built encoder:
import tensorflow as tf
ckpt = tf.train.Checkpoint(encoder=encoder)
ckpt.restore("/tmp/bert_tf2_ckpt").expect_partial()
Similar converters exist for ALBERT (tf2_albert_encoder_checkpoint_converter.py) and other architectures.
Integrating Encoders into Downstream Tasks
All official NLP tasks accept an encoder_cfg argument, internally calling build_encoder so you rarely need to instantiate the encoder manually. The Sentence Prediction Task demonstrates this pattern:
from official.nlp.tasks import sentence_prediction
model = sentence_prediction.SentencePredictionTask(
model_config=sentence_prediction.SentencePredictionConfig(
encoder=my_cfg,
num_classes=3,
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5))
This approach ensures the encoder configuration remains centralized while the task handles input preprocessing, model assembly, and metric computation.
Complete End-to-End Example
The following script demonstrates fine-tuning BERT on a classification task using the configuration-driven API:
import tensorflow as tf
from official.nlp.configs import encoders
from official.nlp.configs.encoders import EncoderConfig, build_encoder
from official.nlp.tasks import sentence_prediction
# 1. Configure BERT-base
cfg = EncoderConfig(
type="bert",
bert=encoders.BertEncoderConfig(
hidden_size=768,
num_layers=12,
num_attention_heads=12,
intermediate_size=3072,
dropout_rate=0.1,
max_position_embeddings=512,
)
)
# 2. Build encoder
encoder = build_encoder(cfg)
# 3. Optional: Restore TF-2 checkpoint
# ckpt = tf.train.Checkpoint(encoder=encoder)
# ckpt.restore("/path/to/bert_tf2_ckpt").expect_partial()
# 4. Assemble downstream task
task_cfg = sentence_prediction.SentencePredictionConfig(
encoder=cfg,
num_classes=3,
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
model = sentence_prediction.SentencePredictionTask(model_config=task_cfg)
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5))
# 5. Train
train_ds = tf.data.TFRecordDataset("train.tfrecord").batch(32)
val_ds = tf.data.TFRecordDataset("dev.tfrecord").batch(32)
model.fit(train_ds, epochs=3, validation_data=val_ds)
Summary
build_encoderinofficial/nlp/configs/encoders.pyis the central factory for constructing transformer encoders from configuration objects- EncoderConfig uses a
typefield to select between architectures (BERT, ALBERT, BigBird) while ignoring unused sub-configs - Pre-trained weights load via
get_encoder_from_hubfor TF-Hub models or checkpoint converters for legacy TF-1 weights - Task APIs automatically invoke
build_encoder, streamlining the path from configuration to training loop - All encoder dataclasses expose full hyperparameter control including hidden sizes, attention heads, and normalization ordering
Frequently Asked Questions
How do I switch between different encoder architectures?
Change the type parameter in EncoderConfig and provide the corresponding sub-config. For example, set type="albert" and populate the albert= field with AlbertEncoderConfig, or use type="bert" with BertEncoderConfig. The factory automatically instantiates the correct network class from official.nlp.modeling.networks based on this selection.
Can I load pre-trained weights from TensorFlow Hub?
Yes. Use task_utils.get_encoder_from_hub(hub_url) from official/nlp/tasks/utils.py to wrap a Hub module. This returns a Keras Model compatible with the task API. When using Hub encoders directly in task configurations, set the encoder field to the Hub model instance rather than a config object.
How do I convert legacy TensorFlow 1 checkpoints for TF2 encoders?
Run the architecture-specific converter scripts located in official/nlp/tools/. For BERT, execute python -m official.nlp.tools.tf2_bert_encoder_checkpoint_converter with --tf1_checkpoint_path and --tf2_checkpoint_path arguments. ALBERT and other encoders have equivalent converters. The output checkpoint restores into TF2 encoder instances using tf.train.Checkpoint.
Should I use build_encoder directly or the Task API?
Use the Task API for standard fine-tuning workflows, as it handles build_encoder invocation, input preprocessing, and model compilation automatically. Call build_encoder directly when you need custom embedding layers, specialized weight restoration logic, or when integrating the encoder into non-standard architectures outside the official task framework.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →