Model Weight Loading in Transformers: Safetensors vs Legacy PyTorch .bin Formats

Transformers loads model weights from .safetensors files by default using zero-copy memory mapping via safetensors.torch.load_file(), but falls back to legacy .bin checkpoints via torch.load() with PyTorch 2.6+ safety guards when safetensors files are absent or when prefer_safe=False is set.

When you instantiate a model using from_pretrained() in the Hugging Face Transformers library, the framework executes a complex model weight loading pipeline that must securely deserialize billions of parameters from disk. According to the huggingface/transformers source code, the library prioritizes the Safetensors format for its security and memory efficiency, while maintaining backward compatibility with legacy PyTorch .bin checkpoints through conditional fallback logic in src/transformers/trainer.py and src/transformers/core_model_loading.py.

How Transformers Detects Checkpoint File Formats

The model weight loading process begins with file detection logic that scans the checkpoint directory for specific filename constants defined in src/transformers/utils/__init__.py.

File Detection Logic in trainer.py

In src/transformers/trainer.py, the _load_best_model method and checkpoint resume logic implement the primary detection branch. The code checks for SAFE_WEIGHTS_NAME (defined as "model.safetensors" at lines 263‑264 of utils/__init__.py) before considering legacy alternatives.

If os.path.isfile(safe_weights_file) evaluates to True, the loader immediately selects the Safetensors path. Otherwise, it falls back to checking for pytorch_model.bin or adapter_model.bin.

The prefer_safe Flag

The detection logic respects a boolean prefer_safe parameter that defaults to True throughout the codebase. When prefer_safe=False is passed to from_pretrained() or Trainer, the library bypasses Safetensors files even if they exist, forcing model weight loading through the legacy .bin pathway.

Loading Mechanisms: Zero-Copy vs Pickle Deserialization

Once the format is detected, the library invokes fundamentally different deserialization mechanisms that impact security, memory usage, and speed.

Safetensors Zero-Copy Loading

For .safetensors files, Transformers calls safetensors.torch.load_file(<path>, device="cpu"). This implementation performs a zero-copy memory mapping operation that reads tensor data directly from disk without executing arbitrary code or creating unnecessary memory copies.

The Safetensors format stores only raw tensor buffers and metadata, eliminating the Python pickle deserialization attack surface entirely. This path requires no version checks or safety guards because the file format itself is strictly limited to numerical data.

PyTorch .bin with Safety Guards

For legacy .bin checkpoints, the library uses torch.load(<path>, map_location="cpu", weights_only=True). However, before executing this call, Transformers runs check_torch_load_is_safe() from src/transformers/utils/import_utils.py (lines 63‑71).

This safety function enforces PyTorch version ≥ 2.6 due to CVE‑2025‑32434, a critical vulnerability in Python's pickle module that affects torch.load. If the installed PyTorch version is older, the function raises a RuntimeError preventing potentially unsafe model weight loading.

Security and Performance Implications

The divergence between these two model weight loading pathways has significant operational consequences for ML pipelines.

CVE-2025-32434 and torch.load Restrictions

The requirement for PyTorch 2.6+ when loading .bin files stems from a pickle deserialization vulnerability tracked as CVE‑2025‑32434. The check_torch_load_is_safe() guard in import_utils.py ensures that users cannot accidentally execute malicious code embedded in legacy checkpoint files on vulnerable PyTorch versions.

Safetensors checkpoints are immune to this vulnerability because they bypass Python's pickle mechanism entirely, using a custom binary format that only stores tensor shapes, dtypes, and raw byte buffers.

Memory Efficiency Benefits

Safetensors provides memory-mapped file loading, allowing the operating system to load tensor pages on demand rather than copying the entire checkpoint into RAM before transferring to GPU. This reduces peak memory consumption during model weight loading, particularly for large models like LLMs where checkpoints may exceed 100GB.

The legacy .bin format requires full deserialization into Python objects before the state dict can be applied to the model, consuming additional memory and CPU cycles during the pickle unpickling process.

Practical Code Examples

Loading with Default Safetensors Preference

from transformers import AutoModel

# Automatically selects model.safetensors if present

model = AutoModel.from_pretrained("meta-llama/Llama-2-7b-hf")

When executing this code, Transformers checks for SAFE_WEIGHTS_NAME ("model.safetensors") in the cache directory and invokes safetensors.torch.load_file() if found.

Forcing Legacy .bin Format

model = AutoModel.from_pretrained(
    "bert-base-uncased",
    prefer_safe=False  # Bypasses safetensors, forces torch.load on .bin

)

This triggers the fallback branch in trainer.py and core_model_loading.py, requiring PyTorch 2.6+ to pass the check_torch_load_is_safe() validation.

Manual Safetensors Loading

import safetensors.torch
from transformers import AutoModel

# Direct file loading without from_pretrained

state_dict = safetensors.torch.load_file("model.safetensors", device="cpu")

model = AutoModel.from_config(config)
model.load_state_dict(state_dict, strict=False)

This demonstrates the zero-copy loading mechanism that underlies the automatic pipeline.

Safe Legacy Loading with Version Check

from transformers.utils.import_utils import check_torch_load_is_safe
import torch

# Explicit safety validation

check_torch_load_is_safe()  # Raises RuntimeError if torch < 2.6

state_dict = torch.load(
    "pytorch_model.bin",
    map_location="cpu",
    weights_only=True
)

This mirrors the internal safety logic that protects against CVE‑2025‑32434.

Summary

  • Default Behavior: Transformers automatically prefers Safetensors (.safetensors) for model weight loading, using safetensors.torch.load_file() for zero-copy, memory-mapped access.
  • Security Model: Safetensors eliminates pickle deserialization vulnerabilities entirely, while legacy .bin files require PyTorch 2.6+ and weights_only=True to mitigate CVE‑2025‑32434 via check_torch_load_is_safe().
  • File Detection: The library checks for SAFE_WEIGHTS_NAME constants in src/transformers/utils/__init__.py and implements the selection logic in src/transformers/trainer.py and core_model_loading.py.
  • Backward Compatibility: Setting prefer_safe=False forces legacy .bin loading, and the library maintains support for both formats in sharded and non-sharded checkpoints.

Frequently Asked Questions

What is the default format for model weight loading in Transformers?

The default format is Safetensors (.safetensors). When you call from_pretrained(), the library looks for model.safetensors first based on the SAFE_WEIGHTS_NAME constant defined in src/transformers/utils/__init__.py. If present, it loads via safetensors.torch.load_file(); otherwise, it falls back to pytorch_model.bin with safety checks.

Why does Transformers require PyTorch 2.6 for .bin files?

Transformers enforces PyTorch 2.6 or newer when loading legacy .bin checkpoints to protect against CVE‑2025‑32434, a critical vulnerability in Python's pickle module that torch.load uses internally. The check_torch_load_is_safe() function in src/transformers/utils/import_utils.py raises a RuntimeError if the installed version is older, ensuring weights_only=True operates securely.

Can I convert existing .bin checkpoints to safetensors?

Yes. The Transformers library includes conversion utilities in src/transformers/safetensors_conversion.py that can transform legacy .bin checkpoints into the Safetensors format. Additionally, when you upload a model to the Hugging Face Hub, the platform often automatically converts .bin files to .safetensors variants if the safe files are missing, making the secure format available for future downloads.

How does sharding work with safetensors vs .bin formats?

Both formats support sharded checkpoints through index files. For Safetensors, the library looks for model.safetensors.index.json (defined as SAFE_WEIGHTS_INDEX_NAME in utils/__init__.py), while legacy checkpoints use pytorch_model.bin.index.json. The loading logic in trainer.py and core_model_loading.py handles sharded loading transparently: safetensors.torch.load_file reads tensor slices from multiple files for the safe format, while sharded .bin loading uses the load_sharded_checkpoint utility with the same PyTorch 2.6 safety requirements.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →