how-to-guide

How to Integrate Headroom with LangChain Using HeadroomChatModel

June 7, 2026 chopratejas/headroom ↗

HeadroomChatModel is a drop-in wrapper for any LangChain BaseChatModel that intercepts every request, runs the message list through Headroom's TransformPipeline, and forwards the optimized payload to your underlying LLM provider.

The Headroom library provides a seamless integration point for LangChain applications through the HeadroomChatModel class. According to the chopratejas/headroom source code, this wrapper enables automatic context compression, tag protection, and prompt optimization without requiring changes to your existing LangChain chains, agents, or LCEL compositions. You instantiate it by passing any compatible chat model, and all subsequent invoke, stream, and ainvoke calls automatically benefit from Headroom's optimization pipeline.

How HeadroomChatModel Works

HeadroomChatModel subclasses LangChain's BaseChatModel and acts as a transparent proxy. In headroom/integrations/langchain/chat_model.py, the wrapper implements the core execution flow:

Interception: When you call invoke(), stream(), or ainvoke(), the wrapper first triggers optimize_messages (lines 150-190).
Pipeline Construction: The optimize_messages function extracts the provider name using get_headroom_provider and the model identifier via get_model_name_from_langchain, then builds a TransformPipeline configured with your current HeadroomConfig.
Transformation: The raw BaseMessage list passes through the pipeline, applying transforms such as search compression, tag protection, and smart crushing.
Forwarding: The optimized message list is sent to the underlying LangChain model (e.g., ChatOpenAI, ChatAnthropic).
Response: The LLM response returns unchanged to the caller, while optional HeadroomCallbackHandler instances record metrics like token usage and latency.

Because HeadroomChatModel replaces the model rather than injecting middleware, it works natively with streaming, tool calls, function calling, and LangChain Expression Language (LCEL) compositions.

HeadroomChatModel Integration Examples

Basic Model Wrapping

Wrap any existing LangChain chat model to instantly enable Headroom optimization:

from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel

# Create a regular LangChain chat model

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Wrap it – all calls now go through Headroom's pipeline

optimized_llm = HeadroomChatModel(llm)

# Normal invoke works as usual

response = optimized_llm.invoke("Explain the difference between memoization and caching.")
print(response.content)

Key files: headroom/integrations/langchain/chat_model.py (lines containing invoke implementation), headroom/integrations/langchain/providers.py (provider mapping).

Streaming and Tool Calls

The wrapper automatically compresses prompts before streaming chunks are sent, maintaining real-time performance while reducing token costs:

from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel
from langchain_core.messages import AIMessage, HumanMessage

llm = ChatOpenAI(model="gpt-4o", streaming=True)
wrapped = HeadroomChatModel(llm)

# Streaming generation (chunks are yielded)

for chunk in wrapped.stream([
    HumanMessage(content="Summarize the following text in bullet points."),
    AIMessage(content="Here is the text you want summarized...")
]):
    print(chunk.content, end="")  # prints as chunks arrive

LCEL Composition with HeadroomRunnable

For LangChain Expression Language pipelines, use HeadroomRunnable to enable functional composition while preserving optimization:

from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel, HeadroomRunnable
from langchain.schema import StrOutputParser

llm = ChatOpenAI(model="gpt-4o")
optimized = HeadroomChatModel(llm)

# Turn the model into a Runnable for LCEL pipelines

runnable = HeadroomRunnable(optimized)

# Simple chain: prompt → LLM → parse

chain = (
    {"question": lambda x: x}  # identity input transformer

    | runnable
    | StrOutputParser()
)

print(chain.invoke("What are the main benefits of using Headroom?"))

HeadroomRunnable is defined in headroom/integrations/langchain/chat_model.py as a RunnableLambda that internally calls HeadroomChatModel.invoke, ensuring every step respects the optimization pipeline.

Customizing Optimization Modes

Control which transforms run by passing a custom HeadroomConfig:

from langchain_openai import ChatOpenAI
from headroom import HeadroomConfig, HeadroomMode
from headroom.integrations import HeadroomChatModel

config = HeadroomConfig(mode=HeadroomMode.COMPACT)   # fewer transforms, faster

llm = ChatOpenAI(model="gpt-4o")
optimized = HeadroomChatModel(llm, config=config)

print(optimized.invoke("Write a concise tweet about AI safety."))

Available modes in headroom/__init__.py include OPTIMIZE (full pipeline), COMPACT (minimal transforms), and SAFE (preservation-focused).

Configuring the Optimization Pipeline

The integration relies on helper functions in headroom/integrations/langchain/providers.py to map LangChain model objects to concrete Headroom providers (OpenAIProvider, AnthropicProvider, etc.). The get_headroom_provider function identifies the provider type, while get_model_name_from_langchain extracts the underlying model identifier required by the TransformPipeline.

You can toggle optimization behavior globally or per-instance via HeadroomConfig (defined in headroom/__init__.py). The wrapper forwards this configuration to the pipeline on every request, allowing dynamic mode switching without re-instantiating the model.

Summary

HeadroomChatModel wraps any LangChain BaseChatModel and transparently applies context optimization through the TransformPipeline.
Located in headroom/integrations/langchain/chat_model.py, the wrapper supports invoke, stream, ainvoke, and asynchronous operations.
Optimization flow: raw messages → optimize_messages → pipeline transforms → underlying LLM → response.
LCEL compatible: Use HeadroomRunnable for functional composition in expression language chains.
Configurable: Pass HeadroomConfig instances to control modes (OPTIMIZE, COMPACT, SAFE) and specific transforms.
Provider mapping: Automatic detection of LLM providers via headroom/integrations/langchain/providers.py.

Frequently Asked Questions

What is the difference between HeadroomChatModel and HeadroomCallbackHandler?

HeadroomChatModel actively modifies messages before they reach the LLM by running them through the TransformPipeline, whereas HeadroomCallbackHandler only records metrics like token usage and latency. Due to LangChain's design, callbacks cannot modify message content, so the wrapper pattern is required for actual optimization.

Can I use HeadroomChatModel with async LangChain operations?

Yes. HeadroomChatModel implements both invoke and ainvoke methods (as well as stream and astream), allowing it to handle asynchronous workflows natively. The async implementations follow the same optimization flow as their synchronous counterparts.

How does HeadroomChatModel map LangChain models to Headroom providers?

The integration uses helper functions get_headroom_provider and get_model_name_from_langchain from headroom/integrations/langchain/providers.py to inspect the wrapped LangChain model instance. These functions map the model class (e.g., ChatOpenAI) to a HeadroomProvider instance and extract the specific model name (e.g., gpt-4o) required by the optimization pipeline.

Is HeadroomChatModel compatible with tool calling and function calling?

Yes. Because HeadroomChatModel subclasses BaseChatModel and passes the transformed message list directly to the underlying model's native methods, all LangChain features including tool calling, function calling, and structured output parsing work without modification. The transforms preserve the message structure and metadata required for these features.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →