How to Integrate Headroom with the Agno Framework: Complete Implementation Guide
Headroom provides a first-class integration for the Agno AI-agent framework through the HeadroomAgnoModel wrapper, which injects context optimization directly into Agno's model invocation pipeline while maintaining full compatibility with Agno's logging and tool-loop machinery.
The chopratejas/headroom repository ships with a dedicated integration module that allows you to integrate Headroom with Agno framework (formerly Phidata) without modifying existing agent code. The integration wraps any Agno model to apply Headroom's compression pipeline before each LLM call, automatically converting messages between Agno's native format and the OpenAI-style dicts required by Headroom's optimization engine.
Architecture Overview
The Agno integration consists of three core components that work together to provide seamless context compression. According to the Headroom source code, these components handle model wrapping, provider detection, and observability.
HeadroomAgnoModel Wrapper
The HeadroomAgnoModel class in headroom/integrations/agno/model.py inherits from agno.models.base.Model and intercepts all standard Agno methods including invoke, ainvoke, and invoke_stream. This wrapper maintains a thread-safe metrics history and tracks running totals of tokens saved across requests.
When processing requests, the wrapper converts Agno Message objects to OpenAI-style dictionaries, executes the TransformPipeline, then converts results back to Agno Message objects. Extended-thinking blocks used by Claude are preserved untouched to ensure provider compatibility.
Provider Detection System
Located in headroom/integrations/agno/providers.py, the get_headroom_provider function inspects the wrapped Agno model's class name, module path, or model ID to automatically select the appropriate Headroom token-counting backend. This supports OpenAI, Anthropic, Google, Cohere, and other providers, ensuring accurate token estimation regardless of the underlying LLM.
Observability Hooks
The headroom/integrations/agno/hooks.py file implements optional HeadroomPreHook and HeadroomPostHook classes that expose token-saving metrics and emit alerts when requests exceed configurable thresholds. These hooks integrate with Agno's native pre/post-hook system for real-time monitoring.
Installation and Setup
Install Headroom with Agno support using the optional extras dependency. You must also install the Agno framework separately if not already present in your environment.
pip install "headroom-ai[agno]"
pip install agno
The integration exports its public API through headroom/integrations/agno/__init__.py, making all wrapper classes and utilities available from the main integration namespace.
Basic Integration Patterns
Wrapping an Agno Model
To integrate Headroom with Agno framework, wrap any Agno model instance with HeadroomAgnoModel before passing it to your Agent. This approach requires zero changes to existing agent configurations or tool definitions.
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel
# Wrap any Agno model with Headroom optimization
model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
# Use the wrapped model with a standard Agno agent
agent = Agent(model=model)
response = agent.run("What is the capital of France?")
print(response)
print(f"Tokens saved: {model.total_tokens_saved}")
The wrapped model forwards all Agno-specific methods after applying Headroom transforms, allowing the agent to benefit from context compression while preserving native Agno functionality like tool loops and logging.
Standalone Message Optimization
For scenarios requiring optimization without the full Agent wrapper, use the optimize_messages utility function. This function processes raw message dictionaries independently of the Agno agent lifecycle.
from headroom.integrations.agno import optimize_messages
from agno.models.openai import OpenAIChat
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Analyse this huge JSON payload..."},
]
opt_msgs, metrics = optimize_messages(
messages,
model="gpt-4o", # model name for token estimation
)
print(f"Saved {metrics['tokens_saved']} tokens")
Advanced Usage Patterns
Adding Observability Hooks
Implement pre- and post-hooks to monitor token usage and receive alerts when requests exceed specified thresholds. Instantiate hooks manually or use the create_headroom_hooks convenience factory.
from headroom.integrations.agno import (
HeadroomAgnoModel,
HeadroomPreHook,
HeadroomPostHook,
create_headroom_hooks,
)
from agno.agent import Agent
from agno.models.openai import OpenAIChat
# Create wrapped model
model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
# Option 1: Manual instantiation
pre_hook = HeadroomPreHook()
post_hook = HeadroomPostHook(token_alert_threshold=10_000)
# Option 2: Factory method
pre_hook, post_hook = create_headroom_hooks(
token_alert_threshold=5_000,
log_level="DEBUG",
)
agent = Agent(
model=model,
pre_hooks=[pre_hook],
post_hooks=[post_hook],
)
# Process multiple requests
for query in ["Summarize the latest AI news.", "Write a short poem."]:
agent.run(query)
print(f"Total tokens saved: {model.total_tokens_saved}")
print("Post-hook summary:", post_hook.get_summary())
Async Usage for High-Throughput Applications
The wrapper fully supports Agno's async methods for high-throughput scenarios. Use aresponse and aresponse_stream with the wrapped model exactly as you would with standard Agno models.
import asyncio
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel
async def main() -> None:
model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
agent = Agent(model=model)
# Async non-streaming response
resp = await agent.aresponse(["user", "Explain quantum tunnelling."])
print(resp)
# Async streaming response
async for chunk in await agent.aresponse_stream(["user", "Give me a story."]):
print(chunk, end="", flush=True)
asyncio.run(main())
Summary
HeadroomAgnoModelinheadroom/integrations/agno/model.pyprovides a drop-in wrapper that inherits from Agno's base Model class and interceptsinvoke,ainvoke, and streaming methods to apply compression.- Automatic provider detection via
get_headroom_providerinheadroom/integrations/agno/providers.pyselects the correct token-counting backend based on the wrapped model's class name or model ID. - Message conversion handles bidirectional transformation between Agno
Messageobjects and OpenAI-style dicts, preserving extended-thinking blocks for Claude compatibility. - Observability hooks in
headroom/integrations/agno/hooks.pyoffer pre- and post-processing monitoring with configurable token thresholds and summary statistics. - Thread-safe metrics tracking allows monitoring of
total_tokens_savedacross concurrent requests in async applications. - Standalone optimization via
optimize_messagesenables context compression outside of the standard Agent workflow.
Frequently Asked Questions
How does HeadroomAgnoModel handle message format conversion?
The wrapper automatically converts Agno Message objects to OpenAI-style dictionaries before running the TransformPipeline, then converts the optimized results back to Agno Message objects. This bidirectional conversion happens in headroom/integrations/agno/model.py and ensures that Agno's logging and tool-loop machinery continue to function unchanged while Headroom applies its compression algorithms.
Can I use Headroom with async Agno agents?
Yes. HeadroomAgnoModel fully supports Agno's async methods including aresponse and aresponse_stream. The wrapper maintains thread-safe metrics history, making it suitable for high-throughput async applications where multiple concurrent requests need token savings tracked accurately.
Does the integration support all LLM providers available in Agno?
The integration supports any provider that Agno supports through automatic provider detection in headroom/integrations/agno/providers.py. The get_headroom_provider function inspects the wrapped model's class name, module path, or model ID to select appropriate token-counting backends for OpenAI, Anthropic, Google, Cohere, and other providers.
What are the observability hooks used for?
HeadroomPreHook and HeadroomPostHook in headroom/integrations/agno/hooks.py provide optional pre- and post-processing hooks that expose token-saving metrics and emit alerts when requests exceed configured thresholds. These integrate with Agno's native hook system to enable real-time monitoring and debugging of context compression performance without modifying core agent logic.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →