# How to Integrate Headroom with Vercel AI SDK for Middleware Compression

> Integrate Headroom with Vercel AI SDK using headroomMiddleware to compress LLM requests client-side. Reduce token usage, preserve streaming, tool-calling, and structured output.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-08

---

**Headroom integrates with the Vercel AI SDK through `headroomMiddleware`, a composable middleware that compresses LLM requests client-side before they reach the model, reducing token usage while preserving streaming, tool-calling, and structured output capabilities.**

Headroom provides a compression-as-a-service layer that seamlessly plugs into the Vercel AI SDK to minimize token costs on every request. By implementing middleware compression, you can automatically shrink message payloads without modifying your application logic or sacrificing features. This guide covers the three integration patterns available in the `chopratejas/headroom` repository, referencing the actual implementation in [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts).


## Prerequisites: Start the Headroom Proxy

Before implementing the SDK integration, you must run the Headroom proxy locally. This proxy exposes the `/v1/compress` endpoint that handles the actual compression logic.

Install and start the proxy:

```bash
pip install "headroom-ai[proxy]"
headroom proxy

```

By default, the proxy serves on `http://localhost:8787`. All middleware configurations reference this URL.


## Integration Patterns for Headroom Vercel AI SDK Middleware

The Headroom SDK provides three interchangeable ways to compress messages, each suited to different architectural needs. All implementations reside in [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts).


### One-Liner Integration with `withHeadroom()`

The `withHeadroom()` function offers the fastest path to production. It wraps any Vercel AI SDK model, automatically injecting the compression middleware behind the scenes.

```typescript
import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';

const model = withHeadroom(openai('gpt-4o'));

const { text } = await generateText({
  model,
  messages: [{ role: 'user', content: 'Summarize these results...' }],
});

```

Under the hood, `withHeadroom()` calls `wrapLanguageModel()` from the `ai` package and passes `headroomMiddleware()` as the middleware argument. This pattern is ideal when you want compression without managing middleware composition manually.


### Composable Middleware with `headroomMiddleware()`

For applications requiring fine-grained control over the request pipeline, use `headroomMiddleware()` directly. This approach lets you stack Headroom compression alongside other middlewares like logging, rate-limiting, or authentication.

In [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts), the `headroomMiddleware()` function constructs a middleware object that conforms to the Vercel AI SDK middleware specification:

```typescript
import { headroomMiddleware } from 'headroom-ai/vercel-ai';
import { wrapLanguageModel } from 'ai';
import { openai } from '@ai-sdk/openai';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: headroomMiddleware({
    baseUrl: 'http://localhost:8787', // Headroom proxy address
    model: 'gpt-4o',                  // Optional: informs the proxy of target model
  }),
});

```

The `baseUrl` parameter must point to your running Headroom proxy instance. You can chain multiple middlewares by passing an array to `wrapLanguageModel()`.


### Standalone Compression with `compressVercelMessages()`

When you need compression statistics or want to preprocess messages without immediately calling a model, use `compressVercelMessages()`. This utility function from [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts) compresses a batch of Vercel-format messages without invoking the LLM.

```typescript
import { compressVercelMessages } from 'headroom-ai/vercel-ai';

const result = await compressVercelMessages(messages, {
  model: 'gpt-4o',
  baseUrl: 'http://localhost:8787',
});

console.log(`Saved ${result.tokensSaved} tokens`);
const compressed = result.messages; // Vercel-compatible message array

```

This pattern suits custom pipelines, pre-processing workflows, or scenarios where you only need the token-saving metrics.


## How Headroom Middleware Compression Works

Understanding the request flow clarifies where compression occurs in the stack. According to the implementation in [`sdk/typescript/src/compress.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/compress.ts) and [`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts), the middleware follows this pipeline:

1. **Message conversion** – Translates Vercel-format messages (`{role, content}`) into the OpenAI-compatible format Headroom expects.
2. **Proxy invocation** – POSTs the payload to `POST /v1/compress` on the local Headroom proxy via the HTTP client defined in [`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts).
3. **Compression execution** – The proxy runs the full Headroom pipeline (SmartCrusher, ContentRouter) and returns compressed messages, token savings, and compression ratios.
4. **Format restoration** – Converts compressed messages back to Vercel format before passing them down the middleware chain.
5. **Model execution** – The underlying model receives the smaller prompt, unaware that compression occurred.

All processing happens **client-side**; the LLM provider never sees uncompressed data, ensuring your token usage stays minimal without altering model behavior.


## Streaming and Advanced Features

Headroom middleware compression operates transparently before the request reaches the model, meaning advanced Vercel AI SDK features work unchanged.

### Streaming Responses

Compression occurs during the request phase, so streaming responses flow back exactly as they would without middleware:

```typescript
import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

const model = withHeadroom(openai('gpt-4o'));

const result = streamText({
  model,
  messages: longConversation,
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk); // Streams uncompressed despite compressed input
}

```

### Tool Calling and Structured Output

Because the middleware only transforms message content before invocation, tool definitions, function schemas, and structured output specifications pass through untouched. The compressed messages maintain all semantic information required for the model to generate valid tool calls or JSON responses.


## Summary

- **Start the proxy** locally using `headroom proxy` before integrating the SDK.
- **Choose your pattern**: Use `withHeadroom()` for quick setups, `headroomMiddleware()` for composable pipelines, or `compressVercelMessages()` for standalone preprocessing.
- **Reference the source**: Implementation details live in [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts), with core compression logic in [`sdk/typescript/src/compress.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/compress.ts) and HTTP handling in [`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts).
- **Preserve functionality**: Middleware compression happens client-side, maintaining full compatibility with streaming, tool-calling, and structured output while significantly reducing token consumption.


## Frequently Asked Questions

### Does Headroom middleware work with streaming responses?

Yes. The `headroomMiddleware()` compresses messages during the request phase before the LLM generates any response. Streaming works identically to uncompressed requests because the middleware only transforms the outgoing payload, not the incoming stream. You can use `streamText()` or `streamObject()` without modification.

### What is the difference between `withHeadroom()` and `headroomMiddleware()`?

**`withHeadroom()`** is a convenience wrapper that internally calls `wrapLanguageModel()` and injects `headroomMiddleware()` automatically. It requires less boilerplate but offers less flexibility. **`headroomMiddleware()`** exposes the underlying middleware object directly, allowing you to compose it with other middlewares like logging or caching via the standard `wrapLanguageModel()` API from the Vercel AI SDK.

### Do I need to run the Headroom proxy locally?

Yes. The proxy must be running at the `baseUrl` specified in your middleware configuration (default `http://localhost:8787`). The Node.js client in [`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts) POSTs messages to the proxy's `/v1/compress` endpoint to perform the actual compression. Without the proxy, the middleware cannot compress messages.

### Can I use Headroom compression with other middlewares?

Yes. When using `headroomMiddleware()` directly with `wrapLanguageModel()`, you can pass an array of middlewares. Headroom will execute its compression step, and subsequent middlewares in the chain receive the compressed message array. This composability is defined in the middleware specification implemented in [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts).