How to Create a Custom Model Architecture that Integrates with AutoModel: A Complete Guide
To integrate a custom model with Hugging Face Transformers' AutoModel ecosystem, you must define a PreTrainedConfig subclass with a unique model_type, implement a PreTrainedModel subclass exposing config_class, register both with AutoConfig.register and AutoModel.register, and use trust_remote_code=True when loading from remote repositories.
The Hugging Face Transformers library provides the AutoClass API (including AutoModel, AutoModelForImageClassification, and other task-specific variants) to dynamically instantiate model architectures from configuration objects. When you create a custom model architecture that integrates with AutoModel, you enable users to load your model using the standard from_pretrained workflow without modifying the library source code, gaining full compatibility with the Trainer API and Hub sharing capabilities.
Step 1: Define a Custom Configuration Class
Every model in the Transformers ecosystem requires a configuration object that specifies hyperparameters and architecture metadata. You must subclass PreTrainedConfig and assign a unique string to the model_type class attribute. This identifier serves as the key that AutoConfig uses to locate your model class.
According to the official documentation in docs/source/en/custom_models.md, the configuration must call super().__init__(**kwargs) to preserve parent fields like name_or_path and transformers_version. The model_type value must be unique to avoid collisions with existing architectures (e.g., "bert", "gpt2", "resnet").
# custom_resnet/configuration_resnet.py
from transformers import PreTrainedConfig
class ResnetConfig(PreTrainedConfig):
"""Configuration for a custom ResNet model."""
model_type = "resnet" # Unique identifier for AutoConfig
def __init__(
self,
block_type="bottleneck",
layers=[3, 4, 6, 3],
num_classes=1000,
input_channels=3,
cardinality=1,
base_width=64,
stem_width=64,
stem_type="",
avg_down=False,
**kwargs,
):
if block_type not in ["basic", "bottleneck"]:
raise ValueError("`block_type` must be 'basic' or 'bottleneck'")
self.block_type = block_type
self.layers = layers
self.num_classes = num_classes
self.input_channels = input_channels
self.cardinality = cardinality
self.base_width = base_width
self.stem_width = stem_width
self.stem_type = stem_type
self.avg_down = avg_down
super().__init__(**kwargs) # Preserve parent configuration fields
Step 2: Implement the Model Class
Your model must inherit from PreTrainedModel (or task-specific bases like PreTrainedModelForImageClassification) and expose the configuration class via the config_class attribute. This attribute binds the model to its configuration, enabling AutoModel to instantiate the correct class when loading from a config file.
As implemented in the base classes, PreTrainedModel provides critical methods like save_pretrained, from_pretrained, and gradient checkpointing utilities. The following example shows both a backbone model and a task-specific classification head, both referencing ResnetConfig via config_class.
# custom_resnet/modeling_resnet.py
import torch
from transformers import PreTrainedModel
from timm.models.resnet import BasicBlock, Bottleneck, ResNet
from .configuration_resnet import ResnetConfig
BLOCK_MAPPING = {"basic": BasicBlock, "bottleneck": Bottleneck}
class ResnetModel(PreTrainedModel):
"""Backbone that returns hidden features."""
config_class = ResnetConfig # Links model to configuration
def __init__(self, config: ResnetConfig):
super().__init__(config)
block = BLOCK_MAPPING[config.block_type]
self.model = ResNet(
block,
config.layers,
num_classes=config.num_classes,
in_chans=config.input_channels,
cardinality=config.cardinality,
base_width=config.base_width,
stem_width=config.stem_width,
stem_type=config.stem_type,
avg_down=config.avg_down,
)
def forward(self, pixel_values):
return self.model.forward_features(pixel_values)
class ResnetModelForImageClassification(PreTrainedModel):
"""Classification head on top of the backbone."""
config_class = ResnetConfig
def __init__(self, config: ResnetConfig):
super().__init__(config)
block = BLOCK_MAPPING[config.block_type]
self.model = ResNet(
block,
config.layers,
num_classes=config.num_classes,
in_chans=config.input_channels,
cardinality=config.cardinality,
base_width=config.base_width,
stem_width=config.stem_width,
stem_type=config.stem_type,
avg_down=config.avg_down,
)
def forward(self, pixel_values, labels=None):
logits = self.model(pixel_values)
if labels is not None:
loss = torch.nn.functional.cross_entropy(logits, labels)
return {"loss": loss, "logits": logits}
return {"logits": logits}
Step 3: Register with Auto Classes
Registration updates the global _LazyAutoMapping in src/transformers/models/auto/auto_factory.py, allowing AutoModel.from_pretrained to resolve your custom class from the configuration's model_type. You must register the configuration with AutoConfig, then register the model classes with the appropriate AutoModel variants.
The register method in _BaseAutoModelClass (the base for all Auto classes) accepts the config class and model class as arguments, inserting them into the mapping dictionary that from_pretrained consults at runtime.
from transformers import AutoConfig, AutoModel, AutoModelForImageClassification
from custom_resnet.configuration_resnet import ResnetConfig
from custom_resnet.modeling_resnet import ResnetModel, ResnetModelForImageClassification
# Register configuration type
AutoConfig.register("resnet", ResnetConfig)
# Register backbone for generic AutoModel
AutoModel.register(ResnetConfig, ResnetModel)
# Register task-specific model
AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassification)
Step 4: Save, Load, and Distribute
Once registered, your custom model architecture supports the full Transformers persistence API. When saving, both the configuration (config.json) and model weights (pytorch_model.bin or model.safetensors) are written to the specified directory.
For models hosted on the Hugging Face Hub or any remote repository, you must pass trust_remote_code=True to from_pretrained. This flag, validated in _BaseAutoModelClass.from_pretrained within src/transformers/models/auto/auto_factory.py, allows dynamic execution of the custom Python files required to instantiate your architecture.
# Save locally
config = ResnetConfig()
config.save_pretrained("custom_resnet")
model = ResnetModelForImageClassification(config)
model.save_pretrained("custom_resnet")
# Load from local directory
local_model = AutoModelForImageClassification.from_pretrained(
"custom_resnet", trust_remote_code=True
)
# Load from the Hub after pushing
hub_model = AutoModelForImageClassification.from_pretrained(
"username/custom-resnet", trust_remote_code=True
)
Summary
- Custom Configuration: Subclass
PreTrainedConfig, set a uniquemodel_type, and callsuper().__init__(**kwargs)to maintain compatibility with the Transformers ecosystem. - Model Implementation: Inherit from
PreTrainedModel, assign your config class toconfig_class, and implement the forward pass. - Registration: Use
AutoConfig.registerandAutoModel.register(or task-specific variants) to insert your classes into the global_LazyAutoMappinginsrc/transformers/models/auto/auto_factory.py. - Remote Execution: Always specify
trust_remote_code=Truewhen loading custom models from the Hub to enable dynamic code execution.
Frequently Asked Questions
What happens if I don't specify a unique model_type?
If the model_type in your configuration conflicts with an existing model (e.g., "bert" or "gpt2"), AutoConfig will resolve to the built-in class associated with that type, causing AutoModel.from_pretrained to instantiate the wrong architecture or raise a validation error when the configuration parameters don't match the expected schema.
Why is trust_remote_code=True mandatory for custom models?
The trust_remote_code=True flag is required because AutoModel.from_pretrained must execute arbitrary Python code from your repository (specifically your modeling_*.py and configuration_*.py files) to instantiate classes that don't exist in the core library. As noted in the _BaseAutoModelClass.from_pretrained implementation in src/transformers/models/auto/auto_factory.py, this security gate prevents silent execution of untrusted code.
Can I register multiple task-specific heads for the same architecture?
Yes. You can register one backbone class with AutoModel and multiple task-specific classes (e.g., AutoModelForImageClassification, AutoModelForSemanticSegmentation) with the same ResnetConfig. Each registration maps the config to a different model class within the respective Auto class's registry, allowing users to load the appropriate head for their task.
Where does the registration logic update the internal mappings?
The registration logic lives in src/transformers/models/auto/auto_factory.py within the _BaseAutoModelClass.register method. This method updates the _LazyAutoMapping dictionary that from_pretrained and from_config consult to resolve model_type strings to Python classes, effectively making your custom model a first-class citizen of the Auto ecosystem.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →