How to Debug Compression Issues with Headroom Performance: 7 Diagnostic Methods
To debug compression issues with headroom perf, verify the active CompressionPolicy via resolve_policy, enable debug logging with HEADROOM_LOG_LEVEL=debug, and inspect the routing_log to identify exactly which transform skipped compression and why.
Debugging compression issues in the chopratejas/headroom repository requires tracing how the proxy decides to shrink request and response payloads. The system relies on a per-auth-mode policy combined with runtime compression ratio calculations to determine whether content gets compressed. By analyzing the decision flow through Python transforms and Rust observability metrics, you can pinpoint why specific traffic bypasses compression entirely.
Understanding Headroom's Compression Pipeline
Headroom's compression system follows a strict pipeline: policy resolution first, then transform execution, followed by ratio-based validation.
Policy Resolution Based on AuthMode
Every request begins by invoking headroom.transforms.compression_policy.resolve_policy to select a CompressionPolicy. This function evaluates the request's AuthMode against the environment flag HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT.
- When enforcement is disabled, the pipeline falls back to aggressive PAYG defaults.
- When enforcement is enabled,
policy_for_modereturns mode-specific configurations (e.g.,live_zone_only=Truefor Subscription mode).
Transform Entry Points
Two primary transforms handle the actual data shrinkage:
SmartCrusher(headroom/transforms/smart_crusher.py): Performs lossless semantic compression through AST manipulation.KompressCompressor(headroom/transforms/kompress_compressor.py): Provides fallback token-budget compression when semantic compression is unsuitable.
Both transforms receive the policy via kwargs (internally stored as _runtime_compression_policy) and validate specific fields:
live_zone_only: Prevents mutations outside designated live zones.cache_aligner_enabled: Determines whetherCacheAlignermaintains stable cache prefixes.max_lossy_ratio: Propagates token loss limits to the Rust dispatcher (though primarily enforced on the Python side).
Key Decision Points in the Compression Flow
Inside SmartCrusher.apply(), the transform executes a series of validation checks before committing to compression:
policy = self._runtime_compression_policy
if policy.live_zone_only and not self._in_live_zone():
return passthrough # no compression for out‑of‑zone content
if policy.cache_aligner_enabled is False and self.is_cache_aligner():
return passthrough # skip cache‑aligner step
# Run the actual compression algorithm …
compressed, ratio = self._compress(messages)
if ratio > policy.max_lossy_ratio:
# Too aggressive – either fallback or keep original
return passthrough
When a transform decides against compression, it records the decision via record_compression in headroom/transforms/content_router.py. These entries populate the request-level routing_log, capturing the exact compression_ratio and skip reason for each stage.
Observability: Python Logs and Rust Metrics
The Rust implementation mirrors this logic in crates/headroom-proxy/src/observability/compression_ratio.rs. When running the headroom-proxy binary, the system emits the final compression ratio as a Prometheus metric (headroom_compression_ratio) suitable for Grafana dashboards. Discrepancies between Python logs and Rust metrics often indicate desynchronization between the policy enforcement layers.
Step-by-Step Debugging Workflow
Follow these systematic steps to diagnose why content isn't compressing:
-
Verify the active policy: Import and call
resolve_policy(auth_mode)to confirm you're using the intended mode (PAYG vs Subscription) and that enforcement flags match your expectations. -
Enable verbose logging: Set
HEADROOM_LOG_LEVEL=debug(ortracefor maximum granularity) to capture everyrecord_compressioncall with its computed ratio. -
Inspect routing logs: After request completion, examine
response.routing_log(JSON array). Each object containstransform_name,compression_ratio, and areasonfield explaining passthrough decisions. -
Force PAYG policy: Temporarily set
HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT=disabledto apply the most permissive defaults. If compression suddenly occurs, your issue is policy-related. -
Tune thresholds: Modify constants in
headroom/transforms/compression_policy.py(_VOLATILE_TOKEN_THRESHOLD_*,_MAX_LOSSY_RATIO_*) to test whether default thresholds are excessively strict for your workload. -
Check Rust metrics: Query Prometheus for
headroom_compression_ratio{job="headroom-proxy"}. Values approaching 1.0 indicate ineffective compression; mismatches with Python logs suggest implementation drift. -
Isolate payload size effects: Send a minimal synthetic request followed by your production payload. This reveals whether the
volatile_token_thresholdincorrectly classifies large payloads as stable cache candidates.
Practical Code Examples
Example 1: Print the Active Policy for a Request
from headroom.transforms.compression_policy import resolve_policy, AuthMode
policy = resolve_policy(AuthMode.SUBSCRIPTION)
print(policy)
# Output:
# CompressionPolicy(live_zone_only=True,
# cache_aligner_enabled=False,
# volatile_token_threshold=32,
# max_lossy_ratio=0.25,
# toin_read_only=True)
Example 2: Enable Debug Logging and Inspect Routing Logs
import os
import logging
os.environ["HEADROOM_LOG_LEVEL"] = "debug"
from headroom.proxy.handlers.openai import handler as openai_handler
# Execute request via test client
response = client.post("/v1/chat/completions", json=payload)
# Analyze the routing log attached to the response
for entry in response.routing_log:
print(f"{entry['transform_name']}: ratio={entry['compression_ratio']:.2f}, reason={entry.get('reason')}")
Example 3: Force PAYG Policy to Bypass Subscription Restrictions
import os
os.environ["HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT"] = "disabled"
# All requests now use aggressive PAYG defaults regardless of auth mode
from headroom.transforms.compression_policy import resolve_policy
policy = resolve_policy(None) # None => unclassified request
assert policy.live_zone_only is False
Summary
- Verify policy configuration using
resolve_policyto ensure the correctAuthModeand enforcement settings are active. - Enable debug logging via
HEADROOM_LOG_LEVEL=debugto capture per-transform decisions inrecord_compressionwithincontent_router.py. - Inspect
routing_logentries to view exact compression ratios and understand why specific transforms returned passthrough. - Compare Python and Rust metrics by checking Prometheus for
headroom_compression_ratioto detect layer desynchronization. - Test with policy enforcement disabled to quickly determine if restrictions are blocking compression.
- Adjust thresholds in
compression_policy.pywhen default values are too conservative for your specific content patterns.
Frequently Asked Questions
Why is my content not being compressed even with aggressive settings?
Check if live_zone_only is enabled in your CompressionPolicy. If the content falls outside the live zone, both SmartCrusher and KompressCompressor return passthrough immediately. Additionally, verify that the calculated compression ratio does not exceed max_lossy_ratio, which causes transforms to preserve the original content to prevent excessive token loss.
How do I distinguish between Python and Rust compression metrics?
The Python side logs individual transform decisions via record_compression in headroom/transforms/content_router.py, showing per-stage ratios and skip reasons. The Rust side in crates/headroom-proxy/src/observability/compression_ratio.rs emits the final headroom_compression_ratio metric. If Python logs show successful compression but Rust reports a ratio near 1.0, investigate whether the Rust dispatcher is decompressing the payload or bypassing the compression layer entirely.
Can I debug compression issues without modifying source code?
Yes. Set the environment variable HEADROOM_LOG_LEVEL=debug to capture detailed routing logs without code changes. Additionally, use HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT=disabled to force the permissive PAYG policy at runtime. These flags allow full diagnostic visibility into whether issues stem from policy configuration or content characteristics.
What does a compression ratio close to 1.0 indicate?
A ratio approaching 1.0 signifies that output size nearly equals input size, indicating minimal or no compression occurred. This typically results from content that is already optimized, a max_lossy_ratio threshold that is too restrictive, or the volatile_token_threshold classifying the payload as stable content unsuitable for mutation. Check the routing_log for the specific transform that produced this ratio to determine the root cause.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →