How to Implement Custom Compression Hooks for Specific Content Types in Headroom
You implement custom compression hooks in Headroom by subclassing CompressionHooks and overriding callbacks like preCompress, computeBiases, or postCompress to inject logic, adjust compression intensity per message type, or log metrics.
Headroom’s compression pipeline is designed to be extensible, allowing you to tailor text compression behavior based on content types such as system prompts, tool calls, user queries, or custom metadata. By implementing custom compression hooks for specific content types in Headroom, you can preserve critical instructions, inject contextual hints, or monitor compression performance without modifying the core library. The hook system is defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py) and provides a consistent contract across both the TypeScript and Python SDKs.
Understanding the Compression Hook Pipeline
Headroom exposes four primary extension points in the compression lifecycle. Each hook receives specific context about the request and can mutate data or influence the compression algorithm.
Core Hook Methods
-
pre_compress(messages, ctx): Runs before any transforms are applied. Receives the full message list and aCompressContextobject. Use this to inject additional context, remove irrelevant messages, or reorder the conversation based on task phase. -
compute_biases(messages, ctx): Runs during the bias-calculation step. Returns a mapping of{msg_index: bias}where values greater than 1.0 preserve more tokens and values less than 1.0 compress more aggressively. Use this for position-aware or content-type-aware compression budgets. -
post_compress(event): Runs after compression completes. Receives aCompressEventcontainingtokens_before,tokens_after,tokens_saved,compression_ratio, and applied transforms. Ideal for logging, analytics, or A/B testing. -
on_pipeline_event(event): Optional hook for pipeline lifecycle events (e.g., start/stop of specific transforms). Receives aPipelineEventfor granular observability.
Data Structures
The hooks rely on two primary data structures defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py):
CompressContext: Contains request metadata includingmodel,user_query,turn_number,tool_calls, andprovider.CompressEvent: Reports compression results includingtokens_saved,compression_ratio, andccr_hashes.
Implementing Custom Hooks for Specific Content Types
To customize behavior for specific content types, create a subclass of CompressionHooks and implement the methods relevant to your use case.
Pre-Processing Messages with preCompress
Use preCompress (TypeScript) or pre_compress (Python) when you need to add, remove, or reorder messages before compression occurs. This is the appropriate hook for injecting system prompts based on existing content or filtering out messages that match specific patterns.
import { CompressionHooks } from "headroom-ai";
import type { CompressContext } from "headroom-ai";
class SecurityContextHooks extends CompressionHooks {
preCompress(messages: any[], ctx: CompressContext) {
// Inject a security hint only when a system message exists
const hasSystem = messages.some(m => m.role === "system");
if (hasSystem) {
messages.unshift({
role: "system",
content: "You are operating in a high‑security context."
});
}
return messages;
}
}
Controlling Compression Aggressiveness with computeBiases
Use computeBiases when the message list should remain unchanged but you want to protect certain content types from aggressive compression. Return a bias map where indices map to float values; system messages might use 2.0 to preserve nearly all tokens, while transient context might use 0.5.
class PriorityBiasHooks extends CompressionHooks {
computeBiases(messages: any[], _ctx: CompressContext) {
const biases: Record<number, number> = {};
for (let i = 0; i < messages.length; i++) {
if (messages[i].role === "system") {
biases[i] = 2.0; // Keep system messages almost intact
} else if (i === messages.length - 1 && messages[i].role === "user") {
biases[i] = 1.5; // Preserve the final user query
}
}
return biases;
}
}
Observing Results with postCompress
Use postCompress to capture metrics or trigger side effects after compression finishes. The CompressEvent parameter provides detailed telemetry.
class LoggingHooks extends CompressionHooks {
postCompress(event: CompressEvent) {
console.log(
`[hook] Compression saved ${event.tokensSaved} tokens (${(
event.compressionRatio * 100
).toFixed(1)}% reduction)`
);
}
}
Complete TypeScript Example
The following example demonstrates a complete implementation that handles system messages and user queries differently, based on the reference implementation in [sdk/typescript/examples/hooks-custom-compression.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/hooks-custom-compression.ts):
import { compress, CompressionHooks } from "headroom-ai";
import type { CompressContext, CompressEvent } from "headroom-ai";
class MyCompressionHooks extends CompressionHooks {
preCompress(messages: any[], ctx: CompressContext) {
const hasSystem = messages.some(m => m.role === "system");
if (hasSystem) {
messages.unshift({
role: "system",
content: "You are operating in a high‑security context."
});
}
return messages;
}
computeBiases(messages: any[], _ctx: CompressContext) {
const biases: Record<number, number> = {};
for (let i = 0; i < messages.length; i++) {
if (messages[i].role === "system") {
biases[i] = 2.0;
} else if (i === messages.length - 1 && messages[i].role === "user") {
biases[i] = 1.5;
}
}
return biases;
}
postCompress(event: CompressEvent) {
console.log(
`[hook] Compression saved ${event.tokensSaved} tokens (${(
event.compressionRatio * 100
).toFixed(1)}% reduction)`
);
}
}
// Usage
async function run() {
const hooks = new MyCompressionHooks();
const result = await compress(
[
{ role: "system", content: "You are an assistant." },
{ role: "user", content: "Explain the difference between TCP and UDP." }
],
{ model: "gpt-4o", hooks }
);
console.log("Compressed messages:", result.messages);
}
run().catch(console.error);
Server-Side Proxy Configuration
When running the Headroom proxy server, you inject hooks via ProxyConfig. This applies your custom logic to every request processed by the proxy.
from headroom import ProxyConfig, CompressionHooks
from headroom.proxy import run_proxy
class MyPythonHooks(CompressionHooks):
def pre_compress(self, messages, ctx):
# Inject metadata for tool calls
if ctx.tool_calls:
messages.insert(0, {"role": "system", "content": "Tool mode active"})
return messages
def compute_biases(self, messages, ctx):
biases = {}
for i, msg in enumerate(messages):
if msg.get("content", "").startswith("{"):
biases[i] = 1.8 # Preserve JSON
return biases
config = ProxyConfig(hooks=MyPythonHooks())
run_proxy(config)
The Python base class follows the same contract as the TypeScript version, ensuring consistent behavior across language implementations as defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py).
When to Use Each Hook
Choose the appropriate hook based on whether you need to modify messages or influence compression intensity:
-
pre_compress: Use when you need to add, remove, or reorder entire messages. For example, injecting a security disclaimer when system prompts are detected or dropping irrelevant tool call history. -
compute_biases: Use when the message list should remain static but specific content types (code snippets, JSON payloads, system instructions) require different compression budgets. Higher values preserve more content. -
post_compress: Use for observability and analytics, such as logging token savings ratios or forwarding metrics to external monitoring systems.
Summary
- Implement custom compression hooks by subclassing
CompressionHooksfrom [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py). - Use
pre_compressto mutate messages before compression,compute_biasesto adjust per-message compression intensity, andpost_compressto observe results. - Return bias values greater than 1.0 to preserve content and less than 1.0 to compress more aggressively.
- Wire hooks into the TypeScript SDK via the
compress()options or into the Python proxy viaProxyConfig. - Reference the complete example in [
sdk/typescript/examples/hooks-custom-compression.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/examples/hooks-custom-compression.ts) for implementation patterns.
Frequently Asked Questions
What is the CompressionHooks base class?
CompressionHooks is the abstract base class defined in [headroom/hooks.py](https://github.com/chopratejas/headroom/blob/main/headroom/hooks.py) that provides no-op implementations of pre_compress, compute_biases, post_compress, and on_pipeline_event. You subclass it to override specific methods while inheriting default behavior for the others.
How do I preserve specific message types from compression?
Override compute_biases and return a dictionary mapping message indices to bias values. For messages you want to preserve, return values greater than 1.0 (e.g., 2.0 for system messages). For content you consider low-priority, return values less than 1.0.
Can I use hooks with the Headroom proxy server?
Yes. Instantiate your custom hook class and pass it to ProxyConfig(hooks=YourHooks()) when configuring the Python proxy server. The proxy will invoke your hooks on every request that passes through the compression pipeline.
How do I debug custom compression hooks?
Use the post_compress hook to log the CompressEvent object, which contains tokens_before, tokens_after, compression_ratio, and the list of applied transforms. You can also implement on_pipeline_event to trace individual pipeline steps for granular debugging.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →