How to Integrate Headroom with Vercel AI SDK for Middleware Compression

Headroom integrates with the Vercel AI SDK through headroomMiddleware, a composable middleware that compresses LLM requests client-side before they reach the model, reducing token usage while preserving streaming, tool-calling, and structured output capabilities.

Headroom provides a compression-as-a-service layer that seamlessly plugs into the Vercel AI SDK to minimize token costs on every request. By implementing middleware compression, you can automatically shrink message payloads without modifying your application logic or sacrificing features. This guide covers the three integration patterns available in the chopratejas/headroom repository, referencing the actual implementation in sdk/typescript/src/adapters/vercel-ai.ts.

Prerequisites: Start the Headroom Proxy

Before implementing the SDK integration, you must run the Headroom proxy locally. This proxy exposes the /v1/compress endpoint that handles the actual compression logic.

Install and start the proxy:

pip install "headroom-ai[proxy]"
headroom proxy

By default, the proxy serves on http://localhost:8787. All middleware configurations reference this URL.

Integration Patterns for Headroom Vercel AI SDK Middleware

The Headroom SDK provides three interchangeable ways to compress messages, each suited to different architectural needs. All implementations reside in sdk/typescript/src/adapters/vercel-ai.ts.

One-Liner Integration with withHeadroom()

The withHeadroom() function offers the fastest path to production. It wraps any Vercel AI SDK model, automatically injecting the compression middleware behind the scenes.

import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';

const model = withHeadroom(openai('gpt-4o'));

const { text } = await generateText({
  model,
  messages: [{ role: 'user', content: 'Summarize these results...' }],
});

Under the hood, withHeadroom() calls wrapLanguageModel() from the ai package and passes headroomMiddleware() as the middleware argument. This pattern is ideal when you want compression without managing middleware composition manually.

Composable Middleware with headroomMiddleware()

For applications requiring fine-grained control over the request pipeline, use headroomMiddleware() directly. This approach lets you stack Headroom compression alongside other middlewares like logging, rate-limiting, or authentication.

In sdk/typescript/src/adapters/vercel-ai.ts, the headroomMiddleware() function constructs a middleware object that conforms to the Vercel AI SDK middleware specification:

import { headroomMiddleware } from 'headroom-ai/vercel-ai';
import { wrapLanguageModel } from 'ai';
import { openai } from '@ai-sdk/openai';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: headroomMiddleware({
    baseUrl: 'http://localhost:8787', // Headroom proxy address
    model: 'gpt-4o',                  // Optional: informs the proxy of target model
  }),
});

The baseUrl parameter must point to your running Headroom proxy instance. You can chain multiple middlewares by passing an array to wrapLanguageModel().

Standalone Compression with compressVercelMessages()

When you need compression statistics or want to preprocess messages without immediately calling a model, use compressVercelMessages(). This utility function from sdk/typescript/src/adapters/vercel-ai.ts compresses a batch of Vercel-format messages without invoking the LLM.

import { compressVercelMessages } from 'headroom-ai/vercel-ai';

const result = await compressVercelMessages(messages, {
  model: 'gpt-4o',
  baseUrl: 'http://localhost:8787',
});

console.log(`Saved ${result.tokensSaved} tokens`);
const compressed = result.messages; // Vercel-compatible message array

This pattern suits custom pipelines, pre-processing workflows, or scenarios where you only need the token-saving metrics.

How Headroom Middleware Compression Works

Understanding the request flow clarifies where compression occurs in the stack. According to the implementation in sdk/typescript/src/compress.ts and sdk/typescript/src/client.ts, the middleware follows this pipeline:

  1. Message conversion – Translates Vercel-format messages ({role, content}) into the OpenAI-compatible format Headroom expects.
  2. Proxy invocation – POSTs the payload to POST /v1/compress on the local Headroom proxy via the HTTP client defined in sdk/typescript/src/client.ts.
  3. Compression execution – The proxy runs the full Headroom pipeline (SmartCrusher, ContentRouter) and returns compressed messages, token savings, and compression ratios.
  4. Format restoration – Converts compressed messages back to Vercel format before passing them down the middleware chain.
  5. Model execution – The underlying model receives the smaller prompt, unaware that compression occurred.

All processing happens client-side; the LLM provider never sees uncompressed data, ensuring your token usage stays minimal without altering model behavior.

Streaming and Advanced Features

Headroom middleware compression operates transparently before the request reaches the model, meaning advanced Vercel AI SDK features work unchanged.

Streaming Responses

Compression occurs during the request phase, so streaming responses flow back exactly as they would without middleware:

import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

const model = withHeadroom(openai('gpt-4o'));

const result = streamText({
  model,
  messages: longConversation,
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk); // Streams uncompressed despite compressed input
}

Tool Calling and Structured Output

Because the middleware only transforms message content before invocation, tool definitions, function schemas, and structured output specifications pass through untouched. The compressed messages maintain all semantic information required for the model to generate valid tool calls or JSON responses.

Summary

  • Start the proxy locally using headroom proxy before integrating the SDK.
  • Choose your pattern: Use withHeadroom() for quick setups, headroomMiddleware() for composable pipelines, or compressVercelMessages() for standalone preprocessing.
  • Reference the source: Implementation details live in sdk/typescript/src/adapters/vercel-ai.ts, with core compression logic in sdk/typescript/src/compress.ts and HTTP handling in sdk/typescript/src/client.ts.
  • Preserve functionality: Middleware compression happens client-side, maintaining full compatibility with streaming, tool-calling, and structured output while significantly reducing token consumption.

Frequently Asked Questions

Does Headroom middleware work with streaming responses?

Yes. The headroomMiddleware() compresses messages during the request phase before the LLM generates any response. Streaming works identically to uncompressed requests because the middleware only transforms the outgoing payload, not the incoming stream. You can use streamText() or streamObject() without modification.

What is the difference between withHeadroom() and headroomMiddleware()?

withHeadroom() is a convenience wrapper that internally calls wrapLanguageModel() and injects headroomMiddleware() automatically. It requires less boilerplate but offers less flexibility. headroomMiddleware() exposes the underlying middleware object directly, allowing you to compose it with other middlewares like logging or caching via the standard wrapLanguageModel() API from the Vercel AI SDK.

Do I need to run the Headroom proxy locally?

Yes. The proxy must be running at the baseUrl specified in your middleware configuration (default http://localhost:8787). The Node.js client in sdk/typescript/src/client.ts POSTs messages to the proxy's /v1/compress endpoint to perform the actual compression. Without the proxy, the middleware cannot compress messages.

Can I use Headroom compression with other middlewares?

Yes. When using headroomMiddleware() directly with wrapLanguageModel(), you can pass an array of middlewares. Headroom will execute its compression step, and subsequent middlewares in the chain receive the compressed message array. This composability is defined in the middleware specification implemented in sdk/typescript/src/adapters/vercel-ai.ts.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →