# How to Use Headroom as a Proxy to Compress Requests to Any OpenAI-Compatible API

> Learn how to use Headroom as a proxy to compress requests to OpenAI compatible APIs. Reduce token usage and optimize your API calls effortlessly with this powerful tool.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-05

---

**Headroom acts as a drop-in proxy layer between your application and any OpenAI-compatible API, automatically sending request payloads to a local `/v1/compress` endpoint and returning reduced-token message lists in their original format.**

The `chopratejas/headroom` repository provides a TypeScript SDK that intercepts outgoing LLM requests and transparently compresses them before they reach your provider. You can use Headroom as a proxy to compress requests to any OpenAI-compatible API—such as OpenAI, Anthropic, Gemini, or the Vercel AI SDK—without rewriting existing client logic.

## How the Headroom Proxy Works

The proxy workflow is driven by the **`compress()`** function exported from [`sdk/typescript/src/compress.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/compress.ts). According to the `chopratejas/headroom` source code, this helper treats the OpenAI chat format as a lingua franca: it detects the incoming message schema, normalizes it, sends it to the local compression server, and maps the result back to the caller’s native shape.

The internal flow follows four concrete steps:

1. **`detectFormat(messages)`** identifies whether the payload originates from OpenAI, Anthropic, Vercel AI SDK, or Gemini.
2. **`toOpenAI(messages)`** converts the native payload into a generic OpenAI-compatible message list.
3. **`HeadroomClient.compress()`** POSTs the normalized payload to `/v1/compress` alongside an optional `model` and `tokenBudget`.
4. **`fromOpenAI(result.messages, inputFormat)`** transforms the compressed response back into the original SDK format.

## Inside the HeadroomClient HTTP Layer

The **`HeadroomClient`** class in [`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts) encapsulates the raw HTTP call to the proxy. Its constructor accepts an optional `baseUrl` and `apiKey`, and its `compress()` method sends a POST request to the `/v1/compress` endpoint with a JSON body containing `messages`, `model`, and `token_budget`.

```typescript
// sdk/typescript/src/client.ts
export class HeadroomClient {
  constructor(opts?: { baseUrl?: string; apiKey?: string }) { ... }

  async compress(
    messages: OpenAIMessage[],
    opts?: { model?: string; tokenBudget?: number }
  ): Promise<CompressResult> {
    const resp = await fetch(`${this.baseUrl}/v1/compress`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${this.apiKey}`
      },
      body: JSON.stringify({
        messages,
        model: opts?.model,
        token_budget: opts?.tokenBudget
      })
    });
    return resp.json();
  }
}

```

## Integration Patterns for OpenAI-Compatible APIs

You can integrate Headroom at three different levels depending on how much control you need over format conversion and transport logic.

### Option 1: High-Level `compress()` Helper

For most TypeScript projects, the **`compress()`** helper—re-exported from [`sdk/typescript/src/index.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/index.ts)—is the fastest path. It auto-detects the input format, calls the proxy, and returns metadata such as `tokensBefore`, `tokensAfter`, and the compressed `messages` array.

```typescript
import { compress } from "@headroom/sdk";

const messages = [
  { role: "user", content: "Explain quantum computing in simple terms." },
  { role: "assistant", content: "Quantum computing uses qubits ... (very long answer)" },
];

const result = await compress(messages, {
  model: "gpt-4o",
  tokenBudget: 2000,
});

console.log("Tokens before:", result.tokensBefore);
console.log("Tokens after :", result.tokensAfter);
console.log("Compressed messages:", result.messages);

```

### Option 2: Direct `HeadroomClient` Usage

If your messages are already in OpenAI chat format, bypass the conversion utilities and invoke **`HeadroomClient.compress()`** directly. This JavaScript example points the client at a local proxy running on port `8787`.

```javascript
const { HeadroomClient } = require("@headroom/sdk");

const client = new HeadroomClient({ baseUrl: "http://localhost:8787" });

const openAIMessages = [
  { role: "user", content: "Write a 2000-word essay on climate change." },
];

client.compress(openAIMessages, { model: "gpt-4o", tokenBudget: 1500 })
  .then(res => {
    console.log("Compressed token count:", res.tokensAfter);
    console.log("Compressed payload:", res.messages);
  })
  .catch(err => console.error("Compression failed:", err));

```

### Option 3: Vercel AI SDK Middleware

Headroom can be wired into existing Vercel AI SDK workflows without changing request-building code. The `headroomMiddleware` adapter—validated in `sdk/typescript/test/adapters/*`—intercepts `chat.completions.create` calls, runs `compress()` under the hood, and forwards the reduced payload.

```typescript
import { createAi } from "@vercel/ai";
import { headroomMiddleware } from "@headroom/sdk";

const ai = createAi({
  baseUrl: "http://localhost:8787",
  middlewares: [headroomMiddleware()],
});

async function ask(prompt: string) {
  const response = await ai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: prompt }],
  });
  console.log(response.choices[0].message.content);
}

```

## Extending Compression with Hooks and Format Utilities

The SDK exposes two optional extension points for customizing proxy behavior before and after the `/v1/compress` call. These utilities live in standalone modules that operate independently of the core `HeadroomClient` transport.

- **Format utilities** ([`sdk/typescript/src/utils/format.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/utils/format.ts)): The `detectFormat`, `toOpenAI`, and `fromOpenAI` functions handle schema translation for OpenAI, Anthropic, Gemini, and Vercel AI SDK payloads.
- **Hooks** ([`sdk/typescript/src/hooks.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/hooks.ts)): You can inject pre-compress, post-compress, and bias-computation logic to manipulate message lists before they reach the proxy or after they return.

## Summary

- Headroom proxies LLM requests through a local `/v1/compress` endpoint managed by `HeadroomClient` in [`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts).
- The `compress()` helper in [`sdk/typescript/src/compress.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/compress.ts) automatically detects provider-specific formats and translates them to and from OpenAI-compatible shapes.
- For native OpenAI message lists, `HeadroomClient.compress()` provides a thin, configurable HTTP wrapper with `baseUrl` and `apiKey` support.
- You can drop Headroom into existing Vercel AI SDK workflows using the built-in middleware adapter tested under `sdk/typescript/test/adapters/*`.
- Optional hooks in [`sdk/typescript/src/hooks.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/hooks.ts) and format utilities in [`sdk/typescript/src/utils/format.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/utils/format.ts) let you customize bias calculation and schema conversion without altering business logic.

## Frequently Asked Questions

### Which OpenAI-compatible APIs work with Headroom?

Headroom supports OpenAI chat, Anthropic, Gemini, and Vercel AI SDK message formats out of the box. The `detectFormat` and `toOpenAI` utilities in [`sdk/typescript/src/utils/format.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/utils/format.ts) normalize these schemas before compression.

### Where does the compression actually happen?

The runtime proxy server receives the POST request at `/v1/compress`. According to the `chopratejas/headroom` source, [`plugins/openclaw/src/proxy-manager.ts`](https://github.com/chopratejas/headroom/blob/main/plugins/openclaw/src/proxy-manager.ts) executes the compression model and returns the reduced-token message list back to the SDK.

### Can I set a custom base URL or API key for the Headroom proxy?

Yes. The `HeadroomClient` constructor in [`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts) accepts an options object with `baseUrl` and `apiKey` properties. This lets you point the SDK at any running Headroom instance and authenticate requests with a Bearer token.

### How do I adjust the token budget or compression model?

Pass the `model` and `tokenBudget` options to either the high-level `compress()` helper or the direct `client.compress()` method. The SDK forwards these values as `model` and `token_budget` in the JSON body of the `/v1/compress` request.