How Headroom's ContentRouter Detects and Routes Content to Different Compression Algorithms

Headroom's ContentRouter is a server-side decision-making layer that inspects normalized conversation context—including message length, tool calls, user query size, model type, and token budget—to select the most appropriate compression algorithm for each LLM payload.

The ContentRouter in chopratejas/headroom acts as the traffic controller for conversation compression, ensuring long or complex payloads are routed to algorithms like smart crusher before reaching the LLM. While the router itself runs on the Headroom proxy server, the TypeScript SDK prepares and normalizes all the metadata it needs to make an informed decision. Understanding how the client constructs this context and how the proxy interprets it is essential for leveraging Headroom's routing logic effectively.

How the TypeScript SDK Prepares Routing Context

The journey begins in sdk/typescript/src/compress.ts, where the compress() function orchestrates the client-side pipeline. Before the ContentRouter ever sees a payload, the function assembles a CompressContext object that captures the essential metadata the router will use to classify the conversation.

Building the CompressContext

At lines 31-38 of sdk/typescript/src/compress.ts, the compress() function instantiates a CompressContext that captures the target model, the extracted user query, the current turn count, any detected tool calls, and the provider name. These metadata fields are populated by helpers in sdk/typescript/src/hooks.ts—specifically extractUserQuery(), countTurns(), and extractToolCalls()—ensuring the server-side router receives a fully described conversation profile.

Running the Optional Pre-Compress Hook

If a consumer supplies a preCompress hook, it executes after context construction but before format detection, allowing messages to be rewritten or filtered. This hook is evaluated in sdk/typescript/src/compress.ts at lines 40-44, giving developers a chance to scrub debug logs or system prompts before the ContentRouter analyzes the payload.

Normalizing Diverse Message Formats

Because the proxy's internal pipeline expects a unified schema, the client first calls detectFormat() from sdk/typescript/src/utils/format.ts to identify whether the input follows OpenAI, Anthropic, Gemini, or Vercel-AI conventions. After detection, toOpenAI() converts every message into the standard {role, content} shape so that the ContentRouter can apply a single set of heuristics regardless of the original provider format.

Injecting Optional Bias Scores

For advanced use cases, consumers can supply a computeBiases hook. As shown in sdk/typescript/src/compress.ts at lines 54-56, this hook attaches per-token bias information to the context. The router can then factor these biases into its decision when determining how aggressively to compress the payload.

How the Server-Side ContentRouter Makes Routing Decisions

Once the client-side preparation is complete, HeadroomClient.compress() in sdk/typescript/src/client.ts transmits the normalized messages and the CompressContext to the Headroom proxy. It is here that the ContentRouter lives, and where the actual algorithm selection occurs.

Evaluating Message Length and Tool Output Volume

The proxy's ContentRouter inspects the total message length. According to test expectations in sdk/typescript/test/client.test.ts at lines 70-82, long assistant responses or bulky tool outputs trigger heavier compression strategies. When the router detects an oversized payload, it may apply smart_crusher at a ratio like 0.35, as reflected in the transforms_applied field (e.g., ["router:smart_crusher:0.35"]).

Preserving Tool Call Semantics

If the CompressContext indicates that toolCalls is non-empty, the router applies specialized logic to preserve tool semantics rather than blindly compressing function arguments or results. In practice, this can result in a tool_preserve transform at 1.00, ensuring that tool output remains intact for downstream LLM reasoning.

Respecting Model Constraints and Token Budgets

The router always considers the requested model and the tokenBudget supplied in the original compress() call. Short user prompts are typically left untouched, while the router focuses its compression effort on the assistant's response history that exceeds the available budget. This model-aware routing prevents unnecessary distortion of high-priority user queries.

Returning Compressed Content to the Client

After the proxy applies the chosen algorithm, the SDK must complete the round-trip. It converts the compressed messages back to their original format and surfaces the results, including which transforms were applied.

Denormalizing Back to Original Formats

The proxy returns compressed messages in OpenAI format, so the client invokes fromOpenAI() in sdk/typescript/src/utils/format.ts to restore the original shape expected by Anthropic, Gemini, or Vercel-AI SDKs. This transparent reverse mapping means developers do not need to handle format translation manually.

Observing Applied Transforms via Post-Compress Hooks

If a postCompress hook was provided, it receives a CompressEvent containing both the compression statistics and the exact list of router transforms applied, such as transformsApplied. As implemented in sdk/typescript/src/compress.ts at lines 70-84, this hook enables logging, metrics collection, or debugging of the ContentRouter's decisions directly inside the consumer's application.

Routing in Practice: Code Examples

The following examples demonstrate how compress() behaves with different conversation shapes. Each snippet shows how the ContentRouter's algorithm choice surfaces in the transformsApplied response field.

Simple User-Assistant Exchange

In this scenario, the router identifies a long assistant response and selects the smart crusher algorithm.

import { compress } from "headroom";

// Example 1 – simple user‑assistant exchange
const msgs = [
  { role: "user", content: "Explain quantum entanglement." },
  { role: "assistant", content: "… (very long explanation) …" },
];

const result = await compress(msgs, {
  model: "gpt-4o",
  tokenBudget: 2000,
});
console.log(result.transformsApplied);
// → ["router:smart_crusher:0.35"]   // router chose the smart crusher algorithm

Preserving Tool Calls

When tool calls are present, the router may choose to preserve them rather than compressing their semantic structure.

// Example 2 – preserving tool calls
const msgsWithTool = [
  { role: "assistant", tool_calls: [{ id: "t1", function: { name: "search", arguments: "{}" } }] },
  { role: "tool", content: "Search results … (large) …" },
];

const result2 = await compress(msgsWithTool, {
  model: "gpt-4o",
  tokenBudget: 1500,
});
console.log(result2.transformsApplied);
// → ["router:tool_preserve:1.00"]   // router kept the tool output intact

Custom Pre- and Post-Compress Hooks

Developers can intercept the pipeline to clean messages before routing and observe statistics afterward.

// Example 3 – custom hooks
import { HeadroomClient, compress } from "headroom";

const client = new HeadroomClient({ apiKey: "YOUR_KEY" });

await compress(msgs, {
  client,
  hooks: {
    preCompress: async (messages, ctx) => {
      // Drop any debug logs before compression
      return messages.filter(m => m.content?.includes("[debug]") === false);
    },
    postCompress: async (event) => {
      console.log(`Saved ${event.tokensSaved} tokens via ${event.transformsApplied[0]}`);
    },
  },
});

Summary

  • The ContentRouter runs on the Headroom proxy server, but the TypeScript SDK in chopratejas/headroom assembles the CompressContext and normalizes messages via toOpenAI() before transmitting them.
  • Message length, tool call presence, user query size, and the model/token budget are the primary signals that drive the router's algorithm selection.
  • The transforms_applied field in the response reveals exactly which algorithm the ContentRouter chose, such as router:smart_crusher:0.35 or router:tool_preserve:1.00.
  • Pre-compress and post-compress hooks, along with optional computeBiases, allow consumers to customize input and observe routing outcomes without managing format translations manually.

Frequently Asked Questions

What inputs does Headroom's ContentRouter evaluate to pick a compression algorithm?

The ContentRouter evaluates the normalized message length, the presence and size of tool outputs, the extracted user query length, the target model name, and the requested tokenBudget. These values are packaged into a CompressContext by the client and sent to the proxy for evaluation.

Does the ContentRouter run on the client or the server?

The ContentRouter executes on the server-side Headroom proxy. However, the client SDK performs all the preparatory work—including detectFormat(), toOpenAI(), and building the CompressContext—so the router receives a consistent, predictable payload.

How does Headroom handle non-OpenAI message formats like Anthropic or Gemini?

The SDK uses detectFormat() and toOpenAI() in sdk/typescript/src/utils/format.ts to convert all incoming messages into the OpenAI schema before the request leaves the client. After the proxy returns compressed results, fromOpenAI() converts them back to the original provider format automatically.

Can developers override or inspect the ContentRouter's decisions?

Yes. Developers can supply a preCompress hook to rewrite messages before they reach the router, and a postCompress hook to receive a CompressEvent that includes the transformsApplied list and token savings. There is also an optional computeBiases hook for influencing compression aggressiveness with per-token signals.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →