# How PEFT Adapter Loading Integrates with Base PreTrainedModel in Transformers

> Learn how PEFT adapter loading integrates with Hugging Face Transformers PreTrainedModel. Discover how adapter weights are injected and trained efficiently.

- Repository: [Hugging Face/transformers](https://github.com/huggingface/transformers)
- Tags: how-to-guide
- Published: 2026-02-21

---

**PEFT adapter loading in Hugging Face Transformers works through the `PeftAdapterMixin` class that wraps a frozen `PreTrainedModel`, injects adapter weights via `load_adapter()`, and exposes the original backbone through the `.base_model` attribute while only training the adapter parameters.**

Parameter‑Efficient Fine‑Tuning (PEFT) lets practitioners adapt massive pre‑trained models by training only small adapter layers instead of full weights. In the Hugging Face `transformers` repository, the integration between PEFT adapters and the base `PreTrainedModel` is handled by a specialized mixin that preserves the original model architecture while enabling dynamic attachment of LoRA, IA³, and other adapter types.

## The PeftAdapterMixin Architecture

The integration centers on **[`PeftAdapterMixin`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/peft.py)**, located in [[`src/transformers/integrations/peft.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/peft.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/peft.py). This mixin is automatically inherited by model classes instantiated through `AutoModel` factories, giving every `PreTrainedModel` the ability to host PEFT adapters without modifying the underlying architecture.

### Core Methods and Attributes

The mixin injects three public methods that handle the adapter lifecycle:

- **`load_adapter()`** – Reads an adapter checkpoint from a local folder or Hub repo, wraps the base model, and attaches the adapter weights.
- **`save_adapter()`** – Persists only the adapter weights and configuration in PEFT’s standard format.
- **`merge_adapter()`** – Fuses the adapter weights into the base model’s parameters for export or inference without the wrapper.

When `load_adapter()` is called, the original model instance is stored in the **`self.base_model`** attribute, allowing direct access to the frozen backbone while the wrapper handles forward‑pass injection.

## Step‑by‑Step Adapter Loading Flow

Calling [`model.load_adapter(adapter_path, adapter_name="default")`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/peft.py) executes a strict initialization sequence:

1. **Version Validation** – The method invokes `check_peft_version(min_version=MIN_PEFT_VERSION)` to ensure PEFT ≥ 0.18.0 is installed, preventing compatibility errors.
2. **State‑Dict Remapping** – Using the `PEFT_TYPE_TO_PREFIX_MAPPING` constant, the loader strips adapter‑specific prefixes (e.g., `lora_`) from checkpoint keys so they align with the base model’s module names.
3. **Wrapper Instantiation** – Depending on the adapter type (LoRA, IA³, etc.), the corresponding PEFT class (e.g., `LoraModel`) is instantiated. The base model is moved into `self.base_model`, and the wrapper installs forward hooks that inject adapter computations.
4. **Memory and Precision Handling** – The method respects the model’s `torch_dtype` and supports `low_cpu_mem_usage=True`, loading adapters with reduced RAM overhead while forcing critical layers (like layer norms) to `float32` for numerical stability.

## Trainer Integration and Base Model Extraction

When a `Trainer` receives a PEFT‑wrapped model, it must occasionally access the raw backbone—for example, to save full checkpoints or export to ONNX. The utility **[`extract_base_model_from_peft`](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py)** in [[`src/transformers/trainer_utils.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py) safely unwraps the model. If the input is not a PEFT wrapper, the function returns the object unchanged, ensuring robust handling across training loops.

## Saving, Reloading, and Merging Adapters

Adapters are persisted independently of the base weights. When you call `model.save_pretrained()`, the implementation in [[`src/transformers/modeling_utils.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py) detects the presence of adapters via `hasattr(self, "peft_config")` and writes an **[`adapter_config.json`](https://github.com/huggingface/transformers/blob/main/adapter_config.json)** alongside the standard [`config.json`](https://github.com/huggingface/transformers/blob/main/config.json). During a subsequent `from_pretrained()` call, passing `adapter_name="default"` automatically re‑attaches the saved adapters.

To collapse the adapter into the base model for production deployment (e.g., TorchScript or ONNX), call **`model.merge_adapter()`**, which adds the LoRA deltas to the original linear weights and removes the wrapper.

## Practical Code Examples

The following snippets demonstrate the complete lifecycle of PEFT adapter integration.

```python

# 1️⃣ Load a base model and attach a PEFT adapter

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Meta-Llama-3-8B"
adapter_repo = "my-org/llama-lora-adapter"

# Load the frozen base model with memory‑efficient settings

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Attach the adapter; model now has a .base_model attribute

model.load_adapter(adapter_repo, adapter_name="default")
model.train()  # Only adapter parameters have requires_grad=True

```

```python

# 2️⃣ Use the PEFT‑wrapped model inside a Trainer

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./lora-finetuned",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    fp16=True,
)

trainer = Trainer(
    model=model,  # PEFT‑wrapped instance

    args=training_args,
    train_dataset=my_dataset,
)

trainer.train()

```

```python

# 3️⃣ Save and reload the adapter together with the base model

model.save_pretrained("./lora-finetuned")
tokenizer.save_pretrained("./lora-finetuned")

# Later reload: the adapter is automatically re‑attached because

# adapter_config.json is present in the directory

model = AutoModelForCausalLM.from_pretrained("./lora-finetuned")
tokenizer = AutoTokenizer.from_pretrained("./lora-finetuned")

```

```python

# 4️⃣ Merge adapter weights into the base model for export

model.merge_adapter()  # Collapses LoRA weights into base linear layers

model.save_pretrained("./merged-model")

# The resulting folder contains a standard model without PEFT wrappers

```

## Key Implementation Files

| File | Role |
|------|------|
| [[`src/transformers/integrations/peft.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/peft.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/peft.py) | Contains `PeftAdapterMixin`, `load_adapter`, `save_adapter`, and `merge_adapter` implementations. |
| [[`src/transformers/trainer_utils.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py) | Provides `extract_base_model_from_peft` to unwrap adapters for checkpointing. |
| [[`src/transformers/modeling_utils.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py) | Houses compatibility hooks for gradient checkpointing (lines 3094‑3096) and logic to save [`adapter_config.json`](https://github.com/huggingface/transformers/blob/main/adapter_config.json). |
| [[`tests/peft_integration/test_peft_integration.py`](https://github.com/huggingface/transformers/blob/main/tests/peft_integration/test_peft_integration.py)](https://github.com/huggingface/transformers/blob/main/tests/peft_integration/test_peft_integration.py) | Integration test suite validating `from_pretrained` with adapters and pipeline usage. |

## Summary

- **`PeftAdapterMixin`** injects PEFT capabilities into every `PreTrainedModel` without altering the base architecture.
- **`load_adapter()`** wraps the model, stores the original in `.base_model`, and handles state‑dict remapping and version checks.
- **`extract_base_model_from_peft`** allows the `Trainer` to safely access the frozen backbone for saving and export.
- Adapters are persisted via **[`adapter_config.json`](https://github.com/huggingface/transformers/blob/main/adapter_config.json)** and can be reloaded automatically or merged back into the base weights using **`merge_adapter()`**.
- The integration respects `torch_dtype` and `low_cpu_mem_usage`, ensuring efficient training on large models.

## Frequently Asked Questions

### How does `load_adapter()` modify the base model structure?

The method does not mutate the base model’s layers directly. Instead, it creates a PEFT wrapper instance (e.g., `LoraModel`) that holds the original model in `self.base_model` and intercepts forward calls to inject adapter computations. This preserves the frozen weights while adding trainable parameters.

### Can I use gradient checkpointing with PEFT adapters?

Yes. The library includes specific safeguards in [[`modeling_utils.py`](https://github.com/huggingface/transformers/blob/main/modeling_utils.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py) (lines 3094‑3096) to ensure that gradient checkpointing works correctly with PEFT wrappers. The frozen base layers propagate gradients through the adapter‑augmented graph without memory errors.

### What is the difference between `save_adapter()` and `save_pretrained()`?

**`save_adapter()`** writes only the adapter weights and configuration (compatible with PEFT’s standalone format), while **`save_pretrained()`** saves the full model directory including [`adapter_config.json`](https://github.com/huggingface/transformers/blob/main/adapter_config.json) and references to the base model. Use `save_adapter()` for sharing adapters on the Hub, and `save_pretrained()` for complete checkpoints.

### How does the Trainer handle PEFT‑wrapped models during checkpointing?

The `Trainer` calls `extract_base_model_from_peft` internally when it needs to access the raw model for operations like saving or exporting. This utility unwraps the PEFT layer if present, ensuring that checkpoints contain the correct model reference while training continues on the adapter‑wrapped instance.