# How to Integrate Headroom with the Agno Framework: Complete Implementation Guide

> Integrate Headroom with the Agno framework using HeadroomAgnoModel. Optimize context directly in Agno's model pipeline for enhanced AI agents.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-10

---

**Headroom provides a first-class integration for the Agno AI-agent framework through the `HeadroomAgnoModel` wrapper, which injects context optimization directly into Agno's model invocation pipeline while maintaining full compatibility with Agno's logging and tool-loop machinery.**

The `chopratejas/headroom` repository ships with a dedicated integration module that allows you to integrate Headroom with Agno framework (formerly Phidata) without modifying existing agent code. The integration wraps any Agno model to apply Headroom's compression pipeline before each LLM call, automatically converting messages between Agno's native format and the OpenAI-style dicts required by Headroom's optimization engine.

## Architecture Overview

The Agno integration consists of three core components that work together to provide seamless context compression. According to the Headroom source code, these components handle model wrapping, provider detection, and observability.

### HeadroomAgnoModel Wrapper

The `HeadroomAgnoModel` class in [`headroom/integrations/agno/model.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/model.py) inherits from `agno.models.base.Model` and intercepts all standard Agno methods including `invoke`, `ainvoke`, and `invoke_stream`. This wrapper maintains a thread-safe metrics history and tracks running totals of tokens saved across requests.

When processing requests, the wrapper converts Agno `Message` objects to OpenAI-style dictionaries, executes the `TransformPipeline`, then converts results back to Agno `Message` objects. Extended-thinking blocks used by Claude are preserved untouched to ensure provider compatibility.

### Provider Detection System

Located in [`headroom/integrations/agno/providers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/providers.py), the `get_headroom_provider` function inspects the wrapped Agno model's class name, module path, or model ID to automatically select the appropriate Headroom token-counting backend. This supports OpenAI, Anthropic, Google, Cohere, and other providers, ensuring accurate token estimation regardless of the underlying LLM.

### Observability Hooks

The [`headroom/integrations/agno/hooks.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/hooks.py) file implements optional `HeadroomPreHook` and `HeadroomPostHook` classes that expose token-saving metrics and emit alerts when requests exceed configurable thresholds. These hooks integrate with Agno's native pre/post-hook system for real-time monitoring.

## Installation and Setup

Install Headroom with Agno support using the optional extras dependency. You must also install the Agno framework separately if not already present in your environment.

```bash
pip install "headroom-ai[agno]"
pip install agno

```

The integration exports its public API through [`headroom/integrations/agno/__init__.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/__init__.py), making all wrapper classes and utilities available from the main integration namespace.

## Basic Integration Patterns

### Wrapping an Agno Model

To integrate Headroom with Agno framework, wrap any Agno model instance with `HeadroomAgnoModel` before passing it to your Agent. This approach requires zero changes to existing agent configurations or tool definitions.

```python
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

# Wrap any Agno model with Headroom optimization

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))

# Use the wrapped model with a standard Agno agent

agent = Agent(model=model)
response = agent.run("What is the capital of France?")

print(response)
print(f"Tokens saved: {model.total_tokens_saved}")

```

The wrapped model forwards all Agno-specific methods after applying Headroom transforms, allowing the agent to benefit from context compression while preserving native Agno functionality like tool loops and logging.

### Standalone Message Optimization

For scenarios requiring optimization without the full Agent wrapper, use the `optimize_messages` utility function. This function processes raw message dictionaries independently of the Agno agent lifecycle.

```python
from headroom.integrations.agno import optimize_messages
from agno.models.openai import OpenAIChat

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Analyse this huge JSON payload..."},
]

opt_msgs, metrics = optimize_messages(
    messages,
    model="gpt-4o",  # model name for token estimation

)

print(f"Saved {metrics['tokens_saved']} tokens")

```

## Advanced Usage Patterns

### Adding Observability Hooks

Implement pre- and post-hooks to monitor token usage and receive alerts when requests exceed specified thresholds. Instantiate hooks manually or use the `create_headroom_hooks` convenience factory.

```python
from headroom.integrations.agno import (
    HeadroomAgnoModel,
    HeadroomPreHook,
    HeadroomPostHook,
    create_headroom_hooks,
)
from agno.agent import Agent
from agno.models.openai import OpenAIChat

# Create wrapped model

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))

# Option 1: Manual instantiation

pre_hook = HeadroomPreHook()
post_hook = HeadroomPostHook(token_alert_threshold=10_000)

# Option 2: Factory method

pre_hook, post_hook = create_headroom_hooks(
    token_alert_threshold=5_000,
    log_level="DEBUG",
)

agent = Agent(
    model=model,
    pre_hooks=[pre_hook],
    post_hooks=[post_hook],
)

# Process multiple requests

for query in ["Summarize the latest AI news.", "Write a short poem."]:
    agent.run(query)

print(f"Total tokens saved: {model.total_tokens_saved}")
print("Post-hook summary:", post_hook.get_summary())

```

### Async Usage for High-Throughput Applications

The wrapper fully supports Agno's async methods for high-throughput scenarios. Use `aresponse` and `aresponse_stream` with the wrapped model exactly as you would with standard Agno models.

```python
import asyncio
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

async def main() -> None:
    model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
    agent = Agent(model=model)

    # Async non-streaming response

    resp = await agent.aresponse(["user", "Explain quantum tunnelling."])
    print(resp)

    # Async streaming response

    async for chunk in await agent.aresponse_stream(["user", "Give me a story."]):
        print(chunk, end="", flush=True)

asyncio.run(main())

```

## Summary

- **`HeadroomAgnoModel`** in [`headroom/integrations/agno/model.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/model.py) provides a drop-in wrapper that inherits from Agno's base Model class and intercepts `invoke`, `ainvoke`, and streaming methods to apply compression.
- **Automatic provider detection** via `get_headroom_provider` in [`headroom/integrations/agno/providers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/providers.py) selects the correct token-counting backend based on the wrapped model's class name or model ID.
- **Message conversion** handles bidirectional transformation between Agno `Message` objects and OpenAI-style dicts, preserving extended-thinking blocks for Claude compatibility.
- **Observability hooks** in [`headroom/integrations/agno/hooks.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/hooks.py) offer pre- and post-processing monitoring with configurable token thresholds and summary statistics.
- **Thread-safe metrics** tracking allows monitoring of `total_tokens_saved` across concurrent requests in async applications.
- **Standalone optimization** via `optimize_messages` enables context compression outside of the standard Agent workflow.

## Frequently Asked Questions

### How does HeadroomAgnoModel handle message format conversion?

The wrapper automatically converts Agno `Message` objects to OpenAI-style dictionaries before running the `TransformPipeline`, then converts the optimized results back to Agno `Message` objects. This bidirectional conversion happens in [`headroom/integrations/agno/model.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/model.py) and ensures that Agno's logging and tool-loop machinery continue to function unchanged while Headroom applies its compression algorithms.

### Can I use Headroom with async Agno agents?

Yes. `HeadroomAgnoModel` fully supports Agno's async methods including `aresponse` and `aresponse_stream`. The wrapper maintains thread-safe metrics history, making it suitable for high-throughput async applications where multiple concurrent requests need token savings tracked accurately.

### Does the integration support all LLM providers available in Agno?

The integration supports any provider that Agno supports through automatic provider detection in [`headroom/integrations/agno/providers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/providers.py). The `get_headroom_provider` function inspects the wrapped model's class name, module path, or model ID to select appropriate token-counting backends for OpenAI, Anthropic, Google, Cohere, and other providers.

### What are the observability hooks used for?

`HeadroomPreHook` and `HeadroomPostHook` in [`headroom/integrations/agno/hooks.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/agno/hooks.py) provide optional pre- and post-processing hooks that expose token-saving metrics and emit alerts when requests exceed configured thresholds. These integrate with Agno's native hook system to enable real-time monitoring and debugging of context compression performance without modifying core agent logic.