How to Use Headroom as a Proxy to Compress Requests to Any OpenAI-Compatible API
Headroom acts as a drop-in proxy layer between your application and any OpenAI-compatible API, automatically sending request payloads to a local /v1/compress endpoint and returning reduced-token message lists in their original format.
The chopratejas/headroom repository provides a TypeScript SDK that intercepts outgoing LLM requests and transparently compresses them before they reach your provider. You can use Headroom as a proxy to compress requests to any OpenAI-compatible API—such as OpenAI, Anthropic, Gemini, or the Vercel AI SDK—without rewriting existing client logic.
How the Headroom Proxy Works
The proxy workflow is driven by the compress() function exported from sdk/typescript/src/compress.ts. According to the chopratejas/headroom source code, this helper treats the OpenAI chat format as a lingua franca: it detects the incoming message schema, normalizes it, sends it to the local compression server, and maps the result back to the caller’s native shape.
The internal flow follows four concrete steps:
detectFormat(messages)identifies whether the payload originates from OpenAI, Anthropic, Vercel AI SDK, or Gemini.toOpenAI(messages)converts the native payload into a generic OpenAI-compatible message list.HeadroomClient.compress()POSTs the normalized payload to/v1/compressalongside an optionalmodelandtokenBudget.fromOpenAI(result.messages, inputFormat)transforms the compressed response back into the original SDK format.
Inside the HeadroomClient HTTP Layer
The HeadroomClient class in sdk/typescript/src/client.ts encapsulates the raw HTTP call to the proxy. Its constructor accepts an optional baseUrl and apiKey, and its compress() method sends a POST request to the /v1/compress endpoint with a JSON body containing messages, model, and token_budget.
// sdk/typescript/src/client.ts
export class HeadroomClient {
constructor(opts?: { baseUrl?: string; apiKey?: string }) { ... }
async compress(
messages: OpenAIMessage[],
opts?: { model?: string; tokenBudget?: number }
): Promise<CompressResult> {
const resp = await fetch(`${this.baseUrl}/v1/compress`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${this.apiKey}`
},
body: JSON.stringify({
messages,
model: opts?.model,
token_budget: opts?.tokenBudget
})
});
return resp.json();
}
}
Integration Patterns for OpenAI-Compatible APIs
You can integrate Headroom at three different levels depending on how much control you need over format conversion and transport logic.
Option 1: High-Level compress() Helper
For most TypeScript projects, the compress() helper—re-exported from sdk/typescript/src/index.ts—is the fastest path. It auto-detects the input format, calls the proxy, and returns metadata such as tokensBefore, tokensAfter, and the compressed messages array.
import { compress } from "@headroom/sdk";
const messages = [
{ role: "user", content: "Explain quantum computing in simple terms." },
{ role: "assistant", content: "Quantum computing uses qubits ... (very long answer)" },
];
const result = await compress(messages, {
model: "gpt-4o",
tokenBudget: 2000,
});
console.log("Tokens before:", result.tokensBefore);
console.log("Tokens after :", result.tokensAfter);
console.log("Compressed messages:", result.messages);
Option 2: Direct HeadroomClient Usage
If your messages are already in OpenAI chat format, bypass the conversion utilities and invoke HeadroomClient.compress() directly. This JavaScript example points the client at a local proxy running on port 8787.
const { HeadroomClient } = require("@headroom/sdk");
const client = new HeadroomClient({ baseUrl: "http://localhost:8787" });
const openAIMessages = [
{ role: "user", content: "Write a 2000-word essay on climate change." },
];
client.compress(openAIMessages, { model: "gpt-4o", tokenBudget: 1500 })
.then(res => {
console.log("Compressed token count:", res.tokensAfter);
console.log("Compressed payload:", res.messages);
})
.catch(err => console.error("Compression failed:", err));
Option 3: Vercel AI SDK Middleware
Headroom can be wired into existing Vercel AI SDK workflows without changing request-building code. The headroomMiddleware adapter—validated in sdk/typescript/test/adapters/*—intercepts chat.completions.create calls, runs compress() under the hood, and forwards the reduced payload.
import { createAi } from "@vercel/ai";
import { headroomMiddleware } from "@headroom/sdk";
const ai = createAi({
baseUrl: "http://localhost:8787",
middlewares: [headroomMiddleware()],
});
async function ask(prompt: string) {
const response = await ai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: prompt }],
});
console.log(response.choices[0].message.content);
}
Extending Compression with Hooks and Format Utilities
The SDK exposes two optional extension points for customizing proxy behavior before and after the /v1/compress call. These utilities live in standalone modules that operate independently of the core HeadroomClient transport.
- Format utilities (
sdk/typescript/src/utils/format.ts): ThedetectFormat,toOpenAI, andfromOpenAIfunctions handle schema translation for OpenAI, Anthropic, Gemini, and Vercel AI SDK payloads. - Hooks (
sdk/typescript/src/hooks.ts): You can inject pre-compress, post-compress, and bias-computation logic to manipulate message lists before they reach the proxy or after they return.
Summary
- Headroom proxies LLM requests through a local
/v1/compressendpoint managed byHeadroomClientinsdk/typescript/src/client.ts. - The
compress()helper insdk/typescript/src/compress.tsautomatically detects provider-specific formats and translates them to and from OpenAI-compatible shapes. - For native OpenAI message lists,
HeadroomClient.compress()provides a thin, configurable HTTP wrapper withbaseUrlandapiKeysupport. - You can drop Headroom into existing Vercel AI SDK workflows using the built-in middleware adapter tested under
sdk/typescript/test/adapters/*. - Optional hooks in
sdk/typescript/src/hooks.tsand format utilities insdk/typescript/src/utils/format.tslet you customize bias calculation and schema conversion without altering business logic.
Frequently Asked Questions
Which OpenAI-compatible APIs work with Headroom?
Headroom supports OpenAI chat, Anthropic, Gemini, and Vercel AI SDK message formats out of the box. The detectFormat and toOpenAI utilities in sdk/typescript/src/utils/format.ts normalize these schemas before compression.
Where does the compression actually happen?
The runtime proxy server receives the POST request at /v1/compress. According to the chopratejas/headroom source, plugins/openclaw/src/proxy-manager.ts executes the compression model and returns the reduced-token message list back to the SDK.
Can I set a custom base URL or API key for the Headroom proxy?
Yes. The HeadroomClient constructor in sdk/typescript/src/client.ts accepts an options object with baseUrl and apiKey properties. This lets you point the SDK at any running Headroom instance and authenticate requests with a Bearer token.
How do I adjust the token budget or compression model?
Pass the model and tokenBudget options to either the high-level compress() helper or the direct client.compress() method. The SDK forwards these values as model and token_budget in the JSON body of the /v1/compress request.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →