How to Use Headroom with the Vercel AI SDK for Streaming Responses
Headroom compresses prompts before they reach the LLM while preserving the Vercel AI SDK's native streaming capabilities, enabling real-time token delivery with reduced context window usage.
The Headroom library reduces token costs by compressing conversation history before sending it to language models. When integrated with the Vercel AI SDK, it intercepts and compresses prompts without interfering with streaming protocols like streamText or streamObject. This guide demonstrates how to implement Headroom in chopratejas/headroom to maintain low-latency streaming responses while minimizing token consumption.
Prerequisites
Before integrating Headroom with the Vercel AI SDK, ensure you have both the proxy server and SDK packages installed.
Install and start the Headroom proxy (required for compression):
pip install "headroom-ai[proxy]" && headroom proxy
The proxy runs on http://localhost:8787 by default. Then install the TypeScript dependencies:
npm install ai @ai-sdk/openai headroom-ai
You can also install additional providers like @ai-sdk/anthropic or @ai-sdk/google as needed.
Integration Methods
The headroom-ai/vercel-ai package exposes three entry points in sdk/typescript/src/adapters/vercel-ai.ts for different integration patterns.
The withHeadroom Wrapper
withHeadroom() is the fastest way to enable compression. It returns a model instance that automatically compresses prompts before each LLM call.
import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
// Wrap the model – compression is applied automatically
const model = withHeadroom(openai('gpt-4o'));
const result = streamText({
model,
messages: longConversation, // any length, will be compressed first
});
// Consume streaming chunks in real-time
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
Internally, this wrapper uses wrapLanguageModel from the ai package combined with Headroom's middleware implementation.
The headroomMiddleware Function
headroomMiddleware() provides granular control when composing multiple middlewares (e.g., logging, tracing, or custom logic).
import { headroomMiddleware } from 'headroom-ai/vercel-ai';
import { wrapLanguageModel } from 'ai';
import { openai } from '@ai-sdk/openai';
const model = wrapLanguageModel({
model: openai('gpt-4o'),
middleware: headroomMiddleware(), // can be combined with other middlewares
});
This approach is ideal when you need to execute Headroom compression alongside existing middleware chains.
The compressVercelMessages Function
compressVercelMessages() is a pure function for custom pipelines that don't use model wrappers. It accepts an array of Vercel Message objects and returns compressed messages with token savings statistics.
import { compressVercelMessages } from 'headroom-ai/vercel-ai';
const { messages, tokensSaved } = await compressVercelMessages(
longConversation,
{ model: 'gpt-4o' }
);
console.log(`Saved ${tokensSaved} tokens`);
How Streaming Works
The integration preserves streaming through a three-stage pipeline implemented in sdk/typescript/src/adapters/vercel-ai.ts:
- Format Conversion – The adapter converts Vercel-style messages to OpenAI format using
vercelToOpenAI(defined insdk/typescript/src/utils/format.ts) - Compression – Messages are sent to the local Headroom proxy at
/v1/compress(handled byheadroom/proxy/server.py), then converted back viaopenAIToVercel - Streaming Execution – The compressed prompt is passed to the underlying model, and streaming APIs like
streamTextorstreamObjectexecute normally without protocol interference
As documented in docs/content/docs/vercel-ai-sdk.mdx: "All other model behavior (tool calling, structured output, streaming) is unchanged."
Configuration Options
Override default settings by passing a configuration object to any integration method:
const model = withHeadroom(openai('gpt-4o'), {
baseUrl: 'http://custom-proxy:8787', // Custom proxy endpoint
model: 'gpt-4o-mini', // Target model for compression ratio calculation
timeout: 5000 // Request timeout in milliseconds
});
The CompressOptions interface allows you to specify alternative endpoints if you're not running the proxy on localhost.
Summary
- Headroom integrates seamlessly with the Vercel AI SDK through the
headroom-ai/vercel-aipackage, providing three entry points:withHeadroom,headroomMiddleware, andcompressVercelMessages - Streaming remains untouched – The compression step occurs before the LLM call, leaving
streamText,streamObject, and other streaming protocols unaffected - Format conversion happens automatically via
vercelToOpenAIandopenAIToVercelinsdk/typescript/src/utils/format.ts - The local proxy is required at
http://localhost:8787(or a custom URL) to handle the actual compression logic defined inheadroom/proxy/server.py
Frequently Asked Questions
Does Headroom interfere with streaming responses?
No. Headroom compresses the prompt before it reaches the language model, but the actual streaming response from the LLM flows directly through the Vercel AI SDK's native streaming protocols. The compression step completes before streamText begins emitting tokens, ensuring zero latency impact on the response stream.
Can I combine Headroom with other middleware?
Yes. Use headroomMiddleware() with the wrapLanguageModel function from the ai package to compose Headroom with logging, tracing, or custom middleware. This is the recommended approach when you need multiple middleware layers rather than the simple withHeadroom wrapper.
What message format conversions occur under the hood?
The adapter converts Vercel Message objects to OpenAI format using vercelToOpenAI, sends them to the Headroom proxy's /v1/compress endpoint, then transforms the compressed result back using openAIToVercel. These utilities are exported from sdk/typescript/src/utils/format.ts and handle system prompts, user messages, and assistant content automatically.
Is the Headroom proxy mandatory for SDK integration?
Yes. The proxy must be running because it executes the actual compression algorithms. The TypeScript SDK communicates with this local server (default http://localhost:8787) via the functions in sdk/typescript/src/compress.ts. Without the proxy, the compression calls will fail with connection errors.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →