# How to Use Headroom with the Vercel AI SDK for Streaming Responses

> Learn how to use Headroom with the Vercel AI SDK for streaming responses. Compress prompts while maintaining real-time token delivery and reducing context window usage.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-09

---

**Headroom compresses prompts before they reach the LLM while preserving the Vercel AI SDK's native streaming capabilities, enabling real-time token delivery with reduced context window usage.**

The Headroom library reduces token costs by compressing conversation history before sending it to language models. When integrated with the Vercel AI SDK, it intercepts and compresses prompts without interfering with streaming protocols like `streamText` or `streamObject`. This guide demonstrates how to implement Headroom in `chopratejas/headroom` to maintain low-latency streaming responses while minimizing token consumption.

## Prerequisites

Before integrating Headroom with the Vercel AI SDK, ensure you have both the proxy server and SDK packages installed.

Install and start the Headroom proxy (required for compression):

```bash
pip install "headroom-ai[proxy]" && headroom proxy

```

The proxy runs on `http://localhost:8787` by default. Then install the TypeScript dependencies:

```bash
npm install ai @ai-sdk/openai headroom-ai

```

You can also install additional providers like `@ai-sdk/anthropic` or `@ai-sdk/google` as needed.

## Integration Methods

The **`headroom-ai/vercel-ai`** package exposes three entry points in [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts) for different integration patterns.

### The withHeadroom Wrapper

**`withHeadroom()`** is the fastest way to enable compression. It returns a model instance that automatically compresses prompts before each LLM call.

```typescript
import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

// Wrap the model – compression is applied automatically
const model = withHeadroom(openai('gpt-4o'));

const result = streamText({
  model,
  messages: longConversation,   // any length, will be compressed first
});

// Consume streaming chunks in real-time
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

```

Internally, this wrapper uses `wrapLanguageModel` from the `ai` package combined with Headroom's middleware implementation.

### The headroomMiddleware Function

**`headroomMiddleware()`** provides granular control when composing multiple middlewares (e.g., logging, tracing, or custom logic).

```typescript
import { headroomMiddleware } from 'headroom-ai/vercel-ai';
import { wrapLanguageModel } from 'ai';
import { openai } from '@ai-sdk/openai';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: headroomMiddleware(),   // can be combined with other middlewares
});

```

This approach is ideal when you need to execute Headroom compression alongside existing middleware chains.

### The compressVercelMessages Function

**`compressVercelMessages()`** is a pure function for custom pipelines that don't use model wrappers. It accepts an array of Vercel `Message` objects and returns compressed messages with token savings statistics.

```typescript
import { compressVercelMessages } from 'headroom-ai/vercel-ai';

const { messages, tokensSaved } = await compressVercelMessages(
  longConversation,
  { model: 'gpt-4o' }
);

console.log(`Saved ${tokensSaved} tokens`);

```

## How Streaming Works

The integration preserves streaming through a three-stage pipeline implemented in [`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts):

1. **Format Conversion** – The adapter converts Vercel-style messages to OpenAI format using `vercelToOpenAI` (defined in [`sdk/typescript/src/utils/format.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/utils/format.ts))
2. **Compression** – Messages are sent to the local Headroom proxy at `/v1/compress` (handled by [`headroom/proxy/server.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/server.py)), then converted back via `openAIToVercel`
3. **Streaming Execution** – The compressed prompt is passed to the underlying model, and streaming APIs like `streamText` or `streamObject` execute normally without protocol interference

As documented in `docs/content/docs/vercel-ai-sdk.mdx`: "All other model behavior (tool calling, structured output, streaming) is unchanged."

## Configuration Options

Override default settings by passing a configuration object to any integration method:

```typescript
const model = withHeadroom(openai('gpt-4o'), {
  baseUrl: 'http://custom-proxy:8787',  // Custom proxy endpoint
  model: 'gpt-4o-mini',                 // Target model for compression ratio calculation
  timeout: 5000                         // Request timeout in milliseconds
});

```

The `CompressOptions` interface allows you to specify alternative endpoints if you're not running the proxy on localhost.

## Summary

- **Headroom integrates seamlessly** with the Vercel AI SDK through the `headroom-ai/vercel-ai` package, providing three entry points: `withHeadroom`, `headroomMiddleware`, and `compressVercelMessages`
- **Streaming remains untouched** – The compression step occurs before the LLM call, leaving `streamText`, `streamObject`, and other streaming protocols unaffected
- **Format conversion happens automatically** via `vercelToOpenAI` and `openAIToVercel` in [`sdk/typescript/src/utils/format.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/utils/format.ts)
- **The local proxy is required** at `http://localhost:8787` (or a custom URL) to handle the actual compression logic defined in [`headroom/proxy/server.py`](https://github.com/chopratejas/headroom/blob/main/headroom/proxy/server.py)

## Frequently Asked Questions

### Does Headroom interfere with streaming responses?

No. Headroom compresses the prompt before it reaches the language model, but the actual streaming response from the LLM flows directly through the Vercel AI SDK's native streaming protocols. The compression step completes before `streamText` begins emitting tokens, ensuring zero latency impact on the response stream.

### Can I combine Headroom with other middleware?

Yes. Use `headroomMiddleware()` with the `wrapLanguageModel` function from the `ai` package to compose Headroom with logging, tracing, or custom middleware. This is the recommended approach when you need multiple middleware layers rather than the simple `withHeadroom` wrapper.

### What message format conversions occur under the hood?

The adapter converts Vercel `Message` objects to OpenAI format using `vercelToOpenAI`, sends them to the Headroom proxy's `/v1/compress` endpoint, then transforms the compressed result back using `openAIToVercel`. These utilities are exported from [`sdk/typescript/src/utils/format.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/utils/format.ts) and handle system prompts, user messages, and assistant content automatically.

### Is the Headroom proxy mandatory for SDK integration?

Yes. The proxy must be running because it executes the actual compression algorithms. The TypeScript SDK communicates with this local server (default `http://localhost:8787`) via the functions in [`sdk/typescript/src/compress.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/compress.ts). Without the proxy, the compression calls will fail with connection errors.