How to Implement Custom Compression Hooks for Specific Content Types in Headroom

You implement custom compression hooks in Headroom by subclassing CompressionHooks and overriding callbacks like preCompress, computeBiases, or postCompress to inject logic, adjust compression intensity per message type, or log metrics.

Headroom’s compression pipeline is designed to be extensible, allowing you to tailor text compression behavior based on content types such as system prompts, tool calls, user queries, or custom metadata. By implementing custom compression hooks for specific content types in Headroom, you can preserve critical instructions, inject contextual hints, or monitor compression performance without modifying the core library. The hook system is defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py) and provides a consistent contract across both the TypeScript and Python SDKs.

Understanding the Compression Hook Pipeline

Headroom exposes four primary extension points in the compression lifecycle. Each hook receives specific context about the request and can mutate data or influence the compression algorithm.

Core Hook Methods

  • pre_compress(messages, ctx): Runs before any transforms are applied. Receives the full message list and a CompressContext object. Use this to inject additional context, remove irrelevant messages, or reorder the conversation based on task phase.

  • compute_biases(messages, ctx): Runs during the bias-calculation step. Returns a mapping of {msg_index: bias} where values greater than 1.0 preserve more tokens and values less than 1.0 compress more aggressively. Use this for position-aware or content-type-aware compression budgets.

  • post_compress(event): Runs after compression completes. Receives a CompressEvent containing tokens_before, tokens_after, tokens_saved, compression_ratio, and applied transforms. Ideal for logging, analytics, or A/B testing.

  • on_pipeline_event(event): Optional hook for pipeline lifecycle events (e.g., start/stop of specific transforms). Receives a PipelineEvent for granular observability.

Data Structures

The hooks rely on two primary data structures defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py):

  • CompressContext: Contains request metadata including model, user_query, turn_number, tool_calls, and provider.
  • CompressEvent: Reports compression results including tokens_saved, compression_ratio, and ccr_hashes.

Implementing Custom Hooks for Specific Content Types

To customize behavior for specific content types, create a subclass of CompressionHooks and implement the methods relevant to your use case.

Pre-Processing Messages with preCompress

Use preCompress (TypeScript) or pre_compress (Python) when you need to add, remove, or reorder messages before compression occurs. This is the appropriate hook for injecting system prompts based on existing content or filtering out messages that match specific patterns.

import { CompressionHooks } from "headroom-ai";
import type { CompressContext } from "headroom-ai";

class SecurityContextHooks extends CompressionHooks {
  preCompress(messages: any[], ctx: CompressContext) {
    // Inject a security hint only when a system message exists
    const hasSystem = messages.some(m => m.role === "system");
    if (hasSystem) {
      messages.unshift({
        role: "system",
        content: "You are operating in a high‑security context."
      });
    }
    return messages;
  }
}

Controlling Compression Aggressiveness with computeBiases

Use computeBiases when the message list should remain unchanged but you want to protect certain content types from aggressive compression. Return a bias map where indices map to float values; system messages might use 2.0 to preserve nearly all tokens, while transient context might use 0.5.

class PriorityBiasHooks extends CompressionHooks {
  computeBiases(messages: any[], _ctx: CompressContext) {
    const biases: Record<number, number> = {};
    for (let i = 0; i < messages.length; i++) {
      if (messages[i].role === "system") {
        biases[i] = 2.0;  // Keep system messages almost intact
      } else if (i === messages.length - 1 && messages[i].role === "user") {
        biases[i] = 1.5;  // Preserve the final user query
      }
    }
    return biases;
  }
}

Observing Results with postCompress

Use postCompress to capture metrics or trigger side effects after compression finishes. The CompressEvent parameter provides detailed telemetry.

class LoggingHooks extends CompressionHooks {
  postCompress(event: CompressEvent) {
    console.log(
      `[hook] Compression saved ${event.tokensSaved} tokens (${(
        event.compressionRatio * 100
      ).toFixed(1)}% reduction)`
    );
  }
}

Complete TypeScript Example

The following example demonstrates a complete implementation that handles system messages and user queries differently, based on the reference implementation in [sdk/typescript/examples/hooks-custom-compression.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/hooks-custom-compression.ts):

import { compress, CompressionHooks } from "headroom-ai";
import type { CompressContext, CompressEvent } from "headroom-ai";

class MyCompressionHooks extends CompressionHooks {
  preCompress(messages: any[], ctx: CompressContext) {
    const hasSystem = messages.some(m => m.role === "system");
    if (hasSystem) {
      messages.unshift({
        role: "system",
        content: "You are operating in a high‑security context."
      });
    }
    return messages;
  }

  computeBiases(messages: any[], _ctx: CompressContext) {
    const biases: Record<number, number> = {};
    for (let i = 0; i < messages.length; i++) {
      if (messages[i].role === "system") {
        biases[i] = 2.0;
      } else if (i === messages.length - 1 && messages[i].role === "user") {
        biases[i] = 1.5;
      }
    }
    return biases;
  }

  postCompress(event: CompressEvent) {
    console.log(
      `[hook] Compression saved ${event.tokensSaved} tokens (${(
        event.compressionRatio * 100
      ).toFixed(1)}% reduction)`
    );
  }
}

// Usage
async function run() {
  const hooks = new MyCompressionHooks();
  const result = await compress(
    [
      { role: "system", content: "You are an assistant." },
      { role: "user", content: "Explain the difference between TCP and UDP." }
    ],
    { model: "gpt-4o", hooks }
  );
  console.log("Compressed messages:", result.messages);
}
run().catch(console.error);

Server-Side Proxy Configuration

When running the Headroom proxy server, you inject hooks via ProxyConfig. This applies your custom logic to every request processed by the proxy.

from headroom import ProxyConfig, CompressionHooks
from headroom.proxy import run_proxy

class MyPythonHooks(CompressionHooks):
    def pre_compress(self, messages, ctx):
        # Inject metadata for tool calls

        if ctx.tool_calls:
            messages.insert(0, {"role": "system", "content": "Tool mode active"})
        return messages
    
    def compute_biases(self, messages, ctx):
        biases = {}
        for i, msg in enumerate(messages):
            if msg.get("content", "").startswith("{"):
                biases[i] = 1.8  # Preserve JSON

        return biases

config = ProxyConfig(hooks=MyPythonHooks())
run_proxy(config)

The Python base class follows the same contract as the TypeScript version, ensuring consistent behavior across language implementations as defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py).

When to Use Each Hook

Choose the appropriate hook based on whether you need to modify messages or influence compression intensity:

  • pre_compress: Use when you need to add, remove, or reorder entire messages. For example, injecting a security disclaimer when system prompts are detected or dropping irrelevant tool call history.

  • compute_biases: Use when the message list should remain static but specific content types (code snippets, JSON payloads, system instructions) require different compression budgets. Higher values preserve more content.

  • post_compress: Use for observability and analytics, such as logging token savings ratios or forwarding metrics to external monitoring systems.

Summary

Frequently Asked Questions

What is the CompressionHooks base class?

CompressionHooks is the abstract base class defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py) that provides no-op implementations of pre_compress, compute_biases, post_compress, and on_pipeline_event. You subclass it to override specific methods while inheriting default behavior for the others.

How do I preserve specific message types from compression?

Override compute_biases and return a dictionary mapping message indices to bias values. For messages you want to preserve, return values greater than 1.0 (e.g., 2.0 for system messages). For content you consider low-priority, return values less than 1.0.

Can I use hooks with the Headroom proxy server?

Yes. Instantiate your custom hook class and pass it to ProxyConfig(hooks=YourHooks()) when configuring the Python proxy server. The proxy will invoke your hooks on every request that passes through the compression pipeline.

How do I debug custom compression hooks?

Use the post_compress hook to log the CompressEvent object, which contains tokens_before, tokens_after, compression_ratio, and the list of applied transforms. You can also implement on_pipeline_event to trace individual pipeline steps for granular debugging.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →