# How to Integrate Headroom with LangChain Using HeadroomChatModel

> Integrate Headroom with LangChain easily using HeadroomChatModel. Optimize LLM payloads with Headroom's TransformPipeline for enhanced performance.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-07

---

**HeadroomChatModel is a drop-in wrapper for any LangChain `BaseChatModel` that intercepts every request, runs the message list through Headroom's TransformPipeline, and forwards the optimized payload to your underlying LLM provider.**

The Headroom library provides a seamless integration point for LangChain applications through the `HeadroomChatModel` class. According to the `chopratejas/headroom` source code, this wrapper enables automatic context compression, tag protection, and prompt optimization without requiring changes to your existing LangChain chains, agents, or LCEL compositions. You instantiate it by passing any compatible chat model, and all subsequent `invoke`, `stream`, and `ainvoke` calls automatically benefit from Headroom's optimization pipeline.

## How HeadroomChatModel Works

`HeadroomChatModel` subclasses LangChain's `BaseChatModel` and acts as a transparent proxy. In [`headroom/integrations/langchain/chat_model.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/chat_model.py), the wrapper implements the core execution flow:

1. **Interception**: When you call `invoke()`, `stream()`, or `ainvoke()`, the wrapper first triggers `optimize_messages` (lines 150-190).
2. **Pipeline Construction**: The `optimize_messages` function extracts the provider name using `get_headroom_provider` and the model identifier via `get_model_name_from_langchain`, then builds a `TransformPipeline` configured with your current `HeadroomConfig`.
3. **Transformation**: The raw `BaseMessage` list passes through the pipeline, applying transforms such as search compression, tag protection, and smart crushing.
4. **Forwarding**: The optimized message list is sent to the underlying LangChain model (e.g., `ChatOpenAI`, `ChatAnthropic`).
5. **Response**: The LLM response returns unchanged to the caller, while optional `HeadroomCallbackHandler` instances record metrics like token usage and latency.

Because `HeadroomChatModel` replaces the model rather than injecting middleware, it works natively with streaming, tool calls, function calling, and LangChain Expression Language (LCEL) compositions.

## HeadroomChatModel Integration Examples

### Basic Model Wrapping

Wrap any existing LangChain chat model to instantly enable Headroom optimization:

```python
from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel

# Create a regular LangChain chat model

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Wrap it – all calls now go through Headroom's pipeline

optimized_llm = HeadroomChatModel(llm)

# Normal invoke works as usual

response = optimized_llm.invoke("Explain the difference between memoization and caching.")
print(response.content)

```

*Key files*: [`headroom/integrations/langchain/chat_model.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/chat_model.py) (lines containing `invoke` implementation), [`headroom/integrations/langchain/providers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/providers.py) (provider mapping).

### Streaming and Tool Calls

The wrapper automatically compresses prompts before streaming chunks are sent, maintaining real-time performance while reducing token costs:

```python
from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel
from langchain_core.messages import AIMessage, HumanMessage

llm = ChatOpenAI(model="gpt-4o", streaming=True)
wrapped = HeadroomChatModel(llm)

# Streaming generation (chunks are yielded)

for chunk in wrapped.stream([
    HumanMessage(content="Summarize the following text in bullet points."),
    AIMessage(content="Here is the text you want summarized...")
]):
    print(chunk.content, end="")  # prints as chunks arrive

```

### LCEL Composition with HeadroomRunnable

For LangChain Expression Language pipelines, use `HeadroomRunnable` to enable functional composition while preserving optimization:

```python
from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel, HeadroomRunnable
from langchain.schema import StrOutputParser

llm = ChatOpenAI(model="gpt-4o")
optimized = HeadroomChatModel(llm)

# Turn the model into a Runnable for LCEL pipelines

runnable = HeadroomRunnable(optimized)

# Simple chain: prompt → LLM → parse

chain = (
    {"question": lambda x: x}  # identity input transformer

    | runnable
    | StrOutputParser()
)

print(chain.invoke("What are the main benefits of using Headroom?"))

```

`HeadroomRunnable` is defined in [`headroom/integrations/langchain/chat_model.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/chat_model.py) as a `RunnableLambda` that internally calls `HeadroomChatModel.invoke`, ensuring every step respects the optimization pipeline.

### Customizing Optimization Modes

Control which transforms run by passing a custom `HeadroomConfig`:

```python
from langchain_openai import ChatOpenAI
from headroom import HeadroomConfig, HeadroomMode
from headroom.integrations import HeadroomChatModel

config = HeadroomConfig(mode=HeadroomMode.COMPACT)   # fewer transforms, faster

llm = ChatOpenAI(model="gpt-4o")
optimized = HeadroomChatModel(llm, config=config)

print(optimized.invoke("Write a concise tweet about AI safety."))

```

Available modes in [`headroom/__init__.py`](https://github.com/chopratejas/headroom/blob/main/headroom/__init__.py) include `OPTIMIZE` (full pipeline), `COMPACT` (minimal transforms), and `SAFE` (preservation-focused).

## Configuring the Optimization Pipeline

The integration relies on helper functions in [`headroom/integrations/langchain/providers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/providers.py) to map LangChain model objects to concrete Headroom providers (`OpenAIProvider`, `AnthropicProvider`, etc.). The `get_headroom_provider` function identifies the provider type, while `get_model_name_from_langchain` extracts the underlying model identifier required by the `TransformPipeline`.

You can toggle optimization behavior globally or per-instance via `HeadroomConfig` (defined in [`headroom/__init__.py`](https://github.com/chopratejas/headroom/blob/main/headroom/__init__.py)). The wrapper forwards this configuration to the pipeline on every request, allowing dynamic mode switching without re-instantiating the model.

## Summary

- **HeadroomChatModel** wraps any LangChain `BaseChatModel` and transparently applies context optimization through the `TransformPipeline`.
- Located in [`headroom/integrations/langchain/chat_model.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/chat_model.py), the wrapper supports `invoke`, `stream`, `ainvoke`, and asynchronous operations.
- **Optimization flow**: raw messages → `optimize_messages` → pipeline transforms → underlying LLM → response.
- **LCEL compatible**: Use `HeadroomRunnable` for functional composition in expression language chains.
- **Configurable**: Pass `HeadroomConfig` instances to control modes (`OPTIMIZE`, `COMPACT`, `SAFE`) and specific transforms.
- **Provider mapping**: Automatic detection of LLM providers via [`headroom/integrations/langchain/providers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/providers.py).

## Frequently Asked Questions

### What is the difference between HeadroomChatModel and HeadroomCallbackHandler?

**HeadroomChatModel** actively modifies messages before they reach the LLM by running them through the `TransformPipeline`, whereas **HeadroomCallbackHandler** only records metrics like token usage and latency. Due to LangChain's design, callbacks cannot modify message content, so the wrapper pattern is required for actual optimization.

### Can I use HeadroomChatModel with async LangChain operations?

Yes. `HeadroomChatModel` implements both `invoke` and `ainvoke` methods (as well as `stream` and `astream`), allowing it to handle asynchronous workflows natively. The async implementations follow the same optimization flow as their synchronous counterparts.

### How does HeadroomChatModel map LangChain models to Headroom providers?

The integration uses helper functions `get_headroom_provider` and `get_model_name_from_langchain` from [`headroom/integrations/langchain/providers.py`](https://github.com/chopratejas/headroom/blob/main/headroom/integrations/langchain/providers.py) to inspect the wrapped LangChain model instance. These functions map the model class (e.g., `ChatOpenAI`) to a `HeadroomProvider` instance and extract the specific model name (e.g., `gpt-4o`) required by the optimization pipeline.

### Is HeadroomChatModel compatible with tool calling and function calling?

Yes. Because `HeadroomChatModel` subclasses `BaseChatModel` and passes the transformed message list directly to the underlying model's native methods, all LangChain features including tool calling, function calling, and structured output parsing work without modification. The transforms preserve the message structure and metadata required for these features.