How to Use Headroom with the Vercel AI SDK for Streaming Responses

Headroom compresses prompts before they reach the LLM while preserving the Vercel AI SDK's native streaming capabilities, enabling real-time token delivery with reduced context window usage.

The Headroom library reduces token costs by compressing conversation history before sending it to language models. When integrated with the Vercel AI SDK, it intercepts and compresses prompts without interfering with streaming protocols like streamText or streamObject. This guide demonstrates how to implement Headroom in chopratejas/headroom to maintain low-latency streaming responses while minimizing token consumption.

Prerequisites

Before integrating Headroom with the Vercel AI SDK, ensure you have both the proxy server and SDK packages installed.

Install and start the Headroom proxy (required for compression):

pip install "headroom-ai[proxy]" && headroom proxy

The proxy runs on http://localhost:8787 by default. Then install the TypeScript dependencies:

npm install ai @ai-sdk/openai headroom-ai

You can also install additional providers like @ai-sdk/anthropic or @ai-sdk/google as needed.

Integration Methods

The headroom-ai/vercel-ai package exposes three entry points in sdk/typescript/src/adapters/vercel-ai.ts for different integration patterns.

The withHeadroom Wrapper

withHeadroom() is the fastest way to enable compression. It returns a model instance that automatically compresses prompts before each LLM call.

import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

// Wrap the model – compression is applied automatically
const model = withHeadroom(openai('gpt-4o'));

const result = streamText({
  model,
  messages: longConversation,   // any length, will be compressed first
});

// Consume streaming chunks in real-time
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Internally, this wrapper uses wrapLanguageModel from the ai package combined with Headroom's middleware implementation.

The headroomMiddleware Function

headroomMiddleware() provides granular control when composing multiple middlewares (e.g., logging, tracing, or custom logic).

import { headroomMiddleware } from 'headroom-ai/vercel-ai';
import { wrapLanguageModel } from 'ai';
import { openai } from '@ai-sdk/openai';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: headroomMiddleware(),   // can be combined with other middlewares
});

This approach is ideal when you need to execute Headroom compression alongside existing middleware chains.

The compressVercelMessages Function

compressVercelMessages() is a pure function for custom pipelines that don't use model wrappers. It accepts an array of Vercel Message objects and returns compressed messages with token savings statistics.

import { compressVercelMessages } from 'headroom-ai/vercel-ai';

const { messages, tokensSaved } = await compressVercelMessages(
  longConversation,
  { model: 'gpt-4o' }
);

console.log(`Saved ${tokensSaved} tokens`);

How Streaming Works

The integration preserves streaming through a three-stage pipeline implemented in sdk/typescript/src/adapters/vercel-ai.ts:

  1. Format Conversion – The adapter converts Vercel-style messages to OpenAI format using vercelToOpenAI (defined in sdk/typescript/src/utils/format.ts)
  2. Compression – Messages are sent to the local Headroom proxy at /v1/compress (handled by headroom/proxy/server.py), then converted back via openAIToVercel
  3. Streaming Execution – The compressed prompt is passed to the underlying model, and streaming APIs like streamText or streamObject execute normally without protocol interference

As documented in docs/content/docs/vercel-ai-sdk.mdx: "All other model behavior (tool calling, structured output, streaming) is unchanged."

Configuration Options

Override default settings by passing a configuration object to any integration method:

const model = withHeadroom(openai('gpt-4o'), {
  baseUrl: 'http://custom-proxy:8787',  // Custom proxy endpoint
  model: 'gpt-4o-mini',                 // Target model for compression ratio calculation
  timeout: 5000                         // Request timeout in milliseconds
});

The CompressOptions interface allows you to specify alternative endpoints if you're not running the proxy on localhost.

Summary

  • Headroom integrates seamlessly with the Vercel AI SDK through the headroom-ai/vercel-ai package, providing three entry points: withHeadroom, headroomMiddleware, and compressVercelMessages
  • Streaming remains untouched – The compression step occurs before the LLM call, leaving streamText, streamObject, and other streaming protocols unaffected
  • Format conversion happens automatically via vercelToOpenAI and openAIToVercel in sdk/typescript/src/utils/format.ts
  • The local proxy is required at http://localhost:8787 (or a custom URL) to handle the actual compression logic defined in headroom/proxy/server.py

Frequently Asked Questions

Does Headroom interfere with streaming responses?

No. Headroom compresses the prompt before it reaches the language model, but the actual streaming response from the LLM flows directly through the Vercel AI SDK's native streaming protocols. The compression step completes before streamText begins emitting tokens, ensuring zero latency impact on the response stream.

Can I combine Headroom with other middleware?

Yes. Use headroomMiddleware() with the wrapLanguageModel function from the ai package to compose Headroom with logging, tracing, or custom middleware. This is the recommended approach when you need multiple middleware layers rather than the simple withHeadroom wrapper.

What message format conversions occur under the hood?

The adapter converts Vercel Message objects to OpenAI format using vercelToOpenAI, sends them to the Headroom proxy's /v1/compress endpoint, then transforms the compressed result back using openAIToVercel. These utilities are exported from sdk/typescript/src/utils/format.ts and handle system prompts, user messages, and assistant content automatically.

Is the Headroom proxy mandatory for SDK integration?

Yes. The proxy must be running because it executes the actual compression algorithms. The TypeScript SDK communicates with this local server (default http://localhost:8787) via the functions in sdk/typescript/src/compress.ts. Without the proxy, the compression calls will fail with connection errors.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →