how-to-guide

How to Debug Compression Issues with Headroom Performance: 7 Diagnostic Methods

June 10, 2026 chopratejas/headroom ↗

To debug compression issues with headroom perf, verify the active CompressionPolicy via resolve_policy, enable debug logging with HEADROOM_LOG_LEVEL=debug, and inspect the routing_log to identify exactly which transform skipped compression and why.

Debugging compression issues in the chopratejas/headroom repository requires tracing how the proxy decides to shrink request and response payloads. The system relies on a per-auth-mode policy combined with runtime compression ratio calculations to determine whether content gets compressed. By analyzing the decision flow through Python transforms and Rust observability metrics, you can pinpoint why specific traffic bypasses compression entirely.

Understanding Headroom's Compression Pipeline

Headroom's compression system follows a strict pipeline: policy resolution first, then transform execution, followed by ratio-based validation.

Policy Resolution Based on AuthMode

Every request begins by invoking headroom.transforms.compression_policy.resolve_policy to select a CompressionPolicy. This function evaluates the request's AuthMode against the environment flag HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT.

When enforcement is disabled, the pipeline falls back to aggressive PAYG defaults.
When enforcement is enabled, policy_for_mode returns mode-specific configurations (e.g., live_zone_only=True for Subscription mode).

Transform Entry Points

Two primary transforms handle the actual data shrinkage:

SmartCrusher (headroom/transforms/smart_crusher.py): Performs lossless semantic compression through AST manipulation.
KompressCompressor (headroom/transforms/kompress_compressor.py): Provides fallback token-budget compression when semantic compression is unsuitable.

Both transforms receive the policy via kwargs (internally stored as _runtime_compression_policy) and validate specific fields:

live_zone_only: Prevents mutations outside designated live zones.
cache_aligner_enabled: Determines whether CacheAligner maintains stable cache prefixes.
max_lossy_ratio: Propagates token loss limits to the Rust dispatcher (though primarily enforced on the Python side).

Key Decision Points in the Compression Flow

Inside SmartCrusher.apply(), the transform executes a series of validation checks before committing to compression:

policy = self._runtime_compression_policy
if policy.live_zone_only and not self._in_live_zone():
    return passthrough   # no compression for out‑of‑zone content

if policy.cache_aligner_enabled is False and self.is_cache_aligner():
    return passthrough   # skip cache‑aligner step

# Run the actual compression algorithm …

compressed, ratio = self._compress(messages)
if ratio > policy.max_lossy_ratio:
    # Too aggressive – either fallback or keep original

    return passthrough

When a transform decides against compression, it records the decision via record_compression in headroom/transforms/content_router.py. These entries populate the request-level routing_log, capturing the exact compression_ratio and skip reason for each stage.

Observability: Python Logs and Rust Metrics

The Rust implementation mirrors this logic in crates/headroom-proxy/src/observability/compression_ratio.rs. When running the headroom-proxy binary, the system emits the final compression ratio as a Prometheus metric (headroom_compression_ratio) suitable for Grafana dashboards. Discrepancies between Python logs and Rust metrics often indicate desynchronization between the policy enforcement layers.

Step-by-Step Debugging Workflow

Follow these systematic steps to diagnose why content isn't compressing:

Verify the active policy: Import and call resolve_policy(auth_mode) to confirm you're using the intended mode (PAYG vs Subscription) and that enforcement flags match your expectations.
Enable verbose logging: Set HEADROOM_LOG_LEVEL=debug (or trace for maximum granularity) to capture every record_compression call with its computed ratio.
Inspect routing logs: After request completion, examine response.routing_log (JSON array). Each object contains transform_name, compression_ratio, and a reason field explaining passthrough decisions.
Force PAYG policy: Temporarily set HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT=disabled to apply the most permissive defaults. If compression suddenly occurs, your issue is policy-related.
Tune thresholds: Modify constants in headroom/transforms/compression_policy.py (_VOLATILE_TOKEN_THRESHOLD_*, _MAX_LOSSY_RATIO_*) to test whether default thresholds are excessively strict for your workload.
Check Rust metrics: Query Prometheus for headroom_compression_ratio{job="headroom-proxy"}. Values approaching 1.0 indicate ineffective compression; mismatches with Python logs suggest implementation drift.
Isolate payload size effects: Send a minimal synthetic request followed by your production payload. This reveals whether the volatile_token_threshold incorrectly classifies large payloads as stable cache candidates.

Practical Code Examples

Example 1: Print the Active Policy for a Request

from headroom.transforms.compression_policy import resolve_policy, AuthMode

policy = resolve_policy(AuthMode.SUBSCRIPTION)
print(policy)

# Output:

# CompressionPolicy(live_zone_only=True,

#                   cache_aligner_enabled=False,

#                   volatile_token_threshold=32,

#                   max_lossy_ratio=0.25,

#                   toin_read_only=True)

Example 2: Enable Debug Logging and Inspect Routing Logs

import os
import logging

os.environ["HEADROOM_LOG_LEVEL"] = "debug"

from headroom.proxy.handlers.openai import handler as openai_handler

# Execute request via test client

response = client.post("/v1/chat/completions", json=payload)

# Analyze the routing log attached to the response

for entry in response.routing_log:
    print(f"{entry['transform_name']}: ratio={entry['compression_ratio']:.2f}, reason={entry.get('reason')}")

Example 3: Force PAYG Policy to Bypass Subscription Restrictions

import os

os.environ["HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT"] = "disabled"

# All requests now use aggressive PAYG defaults regardless of auth mode

from headroom.transforms.compression_policy import resolve_policy

policy = resolve_policy(None)  # None => unclassified request

assert policy.live_zone_only is False

Summary

Verify policy configuration using resolve_policy to ensure the correct AuthMode and enforcement settings are active.
Enable debug logging via HEADROOM_LOG_LEVEL=debug to capture per-transform decisions in record_compression within content_router.py.
Inspect routing_log entries to view exact compression ratios and understand why specific transforms returned passthrough.
Compare Python and Rust metrics by checking Prometheus for headroom_compression_ratio to detect layer desynchronization.
Test with policy enforcement disabled to quickly determine if restrictions are blocking compression.
Adjust thresholds in compression_policy.py when default values are too conservative for your specific content patterns.

Frequently Asked Questions

Why is my content not being compressed even with aggressive settings?

Check if live_zone_only is enabled in your CompressionPolicy. If the content falls outside the live zone, both SmartCrusher and KompressCompressor return passthrough immediately. Additionally, verify that the calculated compression ratio does not exceed max_lossy_ratio, which causes transforms to preserve the original content to prevent excessive token loss.

How do I distinguish between Python and Rust compression metrics?

The Python side logs individual transform decisions via record_compression in headroom/transforms/content_router.py, showing per-stage ratios and skip reasons. The Rust side in crates/headroom-proxy/src/observability/compression_ratio.rs emits the final headroom_compression_ratio metric. If Python logs show successful compression but Rust reports a ratio near 1.0, investigate whether the Rust dispatcher is decompressing the payload or bypassing the compression layer entirely.

Can I debug compression issues without modifying source code?

Yes. Set the environment variable HEADROOM_LOG_LEVEL=debug to capture detailed routing logs without code changes. Additionally, use HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT=disabled to force the permissive PAYG policy at runtime. These flags allow full diagnostic visibility into whether issues stem from policy configuration or content characteristics.

What does a compression ratio close to 1.0 indicate?

A ratio approaching 1.0 signifies that output size nearly equals input size, indicating minimal or no compression occurred. This typically results from content that is already optimized, a max_lossy_ratio threshold that is too restrictive, or the volatile_token_threshold classifying the payload as stable content unsuitable for mutation. Check the routing_log for the specific transform that produced this ratio to determine the root cause.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how chopratejas/headroom works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →