How to Integrate Headroom with the Agno Framework: Complete Implementation Guide

Headroom provides a first-class integration for the Agno AI-agent framework through the HeadroomAgnoModel wrapper, which injects context optimization directly into Agno's model invocation pipeline while maintaining full compatibility with Agno's logging and tool-loop machinery.

The chopratejas/headroom repository ships with a dedicated integration module that allows you to integrate Headroom with Agno framework (formerly Phidata) without modifying existing agent code. The integration wraps any Agno model to apply Headroom's compression pipeline before each LLM call, automatically converting messages between Agno's native format and the OpenAI-style dicts required by Headroom's optimization engine.

Architecture Overview

The Agno integration consists of three core components that work together to provide seamless context compression. According to the Headroom source code, these components handle model wrapping, provider detection, and observability.

HeadroomAgnoModel Wrapper

The HeadroomAgnoModel class in headroom/integrations/agno/model.py inherits from agno.models.base.Model and intercepts all standard Agno methods including invoke, ainvoke, and invoke_stream. This wrapper maintains a thread-safe metrics history and tracks running totals of tokens saved across requests.

When processing requests, the wrapper converts Agno Message objects to OpenAI-style dictionaries, executes the TransformPipeline, then converts results back to Agno Message objects. Extended-thinking blocks used by Claude are preserved untouched to ensure provider compatibility.

Provider Detection System

Located in headroom/integrations/agno/providers.py, the get_headroom_provider function inspects the wrapped Agno model's class name, module path, or model ID to automatically select the appropriate Headroom token-counting backend. This supports OpenAI, Anthropic, Google, Cohere, and other providers, ensuring accurate token estimation regardless of the underlying LLM.

Observability Hooks

The headroom/integrations/agno/hooks.py file implements optional HeadroomPreHook and HeadroomPostHook classes that expose token-saving metrics and emit alerts when requests exceed configurable thresholds. These hooks integrate with Agno's native pre/post-hook system for real-time monitoring.

Installation and Setup

Install Headroom with Agno support using the optional extras dependency. You must also install the Agno framework separately if not already present in your environment.

pip install "headroom-ai[agno]"
pip install agno

The integration exports its public API through headroom/integrations/agno/__init__.py, making all wrapper classes and utilities available from the main integration namespace.

Basic Integration Patterns

Wrapping an Agno Model

To integrate Headroom with Agno framework, wrap any Agno model instance with HeadroomAgnoModel before passing it to your Agent. This approach requires zero changes to existing agent configurations or tool definitions.

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

# Wrap any Agno model with Headroom optimization

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))

# Use the wrapped model with a standard Agno agent

agent = Agent(model=model)
response = agent.run("What is the capital of France?")

print(response)
print(f"Tokens saved: {model.total_tokens_saved}")

The wrapped model forwards all Agno-specific methods after applying Headroom transforms, allowing the agent to benefit from context compression while preserving native Agno functionality like tool loops and logging.

Standalone Message Optimization

For scenarios requiring optimization without the full Agent wrapper, use the optimize_messages utility function. This function processes raw message dictionaries independently of the Agno agent lifecycle.

from headroom.integrations.agno import optimize_messages
from agno.models.openai import OpenAIChat

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Analyse this huge JSON payload..."},
]

opt_msgs, metrics = optimize_messages(
    messages,
    model="gpt-4o",  # model name for token estimation

)

print(f"Saved {metrics['tokens_saved']} tokens")

Advanced Usage Patterns

Adding Observability Hooks

Implement pre- and post-hooks to monitor token usage and receive alerts when requests exceed specified thresholds. Instantiate hooks manually or use the create_headroom_hooks convenience factory.

from headroom.integrations.agno import (
    HeadroomAgnoModel,
    HeadroomPreHook,
    HeadroomPostHook,
    create_headroom_hooks,
)
from agno.agent import Agent
from agno.models.openai import OpenAIChat

# Create wrapped model

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))

# Option 1: Manual instantiation

pre_hook = HeadroomPreHook()
post_hook = HeadroomPostHook(token_alert_threshold=10_000)

# Option 2: Factory method

pre_hook, post_hook = create_headroom_hooks(
    token_alert_threshold=5_000,
    log_level="DEBUG",
)

agent = Agent(
    model=model,
    pre_hooks=[pre_hook],
    post_hooks=[post_hook],
)

# Process multiple requests

for query in ["Summarize the latest AI news.", "Write a short poem."]:
    agent.run(query)

print(f"Total tokens saved: {model.total_tokens_saved}")
print("Post-hook summary:", post_hook.get_summary())

Async Usage for High-Throughput Applications

The wrapper fully supports Agno's async methods for high-throughput scenarios. Use aresponse and aresponse_stream with the wrapped model exactly as you would with standard Agno models.

import asyncio
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

async def main() -> None:
    model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
    agent = Agent(model=model)

    # Async non-streaming response

    resp = await agent.aresponse(["user", "Explain quantum tunnelling."])
    print(resp)

    # Async streaming response

    async for chunk in await agent.aresponse_stream(["user", "Give me a story."]):
        print(chunk, end="", flush=True)

asyncio.run(main())

Summary

  • HeadroomAgnoModel in headroom/integrations/agno/model.py provides a drop-in wrapper that inherits from Agno's base Model class and intercepts invoke, ainvoke, and streaming methods to apply compression.
  • Automatic provider detection via get_headroom_provider in headroom/integrations/agno/providers.py selects the correct token-counting backend based on the wrapped model's class name or model ID.
  • Message conversion handles bidirectional transformation between Agno Message objects and OpenAI-style dicts, preserving extended-thinking blocks for Claude compatibility.
  • Observability hooks in headroom/integrations/agno/hooks.py offer pre- and post-processing monitoring with configurable token thresholds and summary statistics.
  • Thread-safe metrics tracking allows monitoring of total_tokens_saved across concurrent requests in async applications.
  • Standalone optimization via optimize_messages enables context compression outside of the standard Agent workflow.

Frequently Asked Questions

How does HeadroomAgnoModel handle message format conversion?

The wrapper automatically converts Agno Message objects to OpenAI-style dictionaries before running the TransformPipeline, then converts the optimized results back to Agno Message objects. This bidirectional conversion happens in headroom/integrations/agno/model.py and ensures that Agno's logging and tool-loop machinery continue to function unchanged while Headroom applies its compression algorithms.

Can I use Headroom with async Agno agents?

Yes. HeadroomAgnoModel fully supports Agno's async methods including aresponse and aresponse_stream. The wrapper maintains thread-safe metrics history, making it suitable for high-throughput async applications where multiple concurrent requests need token savings tracked accurately.

Does the integration support all LLM providers available in Agno?

The integration supports any provider that Agno supports through automatic provider detection in headroom/integrations/agno/providers.py. The get_headroom_provider function inspects the wrapped model's class name, module path, or model ID to select appropriate token-counting backends for OpenAI, Anthropic, Google, Cohere, and other providers.

What are the observability hooks used for?

HeadroomPreHook and HeadroomPostHook in headroom/integrations/agno/hooks.py provide optional pre- and post-processing hooks that expose token-saving metrics and emit alerts when requests exceed configured thresholds. These integrate with Agno's native hook system to enable real-time monitoring and debugging of context compression performance without modifying core agent logic.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →