# How to Debug Compression Issues with Headroom Performance: 7 Diagnostic Methods

> Troubleshoot Headroom performance compression problems. Use 7 diagnostic methods to find and fix issues fast. Debug policy, logging, and routing for better performance.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-10

---

**To debug compression issues with headroom perf, verify the active `CompressionPolicy` via `resolve_policy`, enable debug logging with `HEADROOM_LOG_LEVEL=debug`, and inspect the `routing_log` to identify exactly which transform skipped compression and why.**

Debugging compression issues in the `chopratejas/headroom` repository requires tracing how the proxy decides to shrink request and response payloads. The system relies on a **per-auth-mode policy** combined with runtime **compression ratio** calculations to determine whether content gets compressed. By analyzing the decision flow through Python transforms and Rust observability metrics, you can pinpoint why specific traffic bypasses compression entirely.

## Understanding Headroom's Compression Pipeline

Headroom's compression system follows a strict pipeline: policy resolution first, then transform execution, followed by ratio-based validation.

### Policy Resolution Based on AuthMode

Every request begins by invoking `headroom.transforms.compression_policy.resolve_policy` to select a `CompressionPolicy`. This function evaluates the request's `AuthMode` against the environment flag `HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT`.

- When enforcement is **disabled**, the pipeline falls back to aggressive PAYG defaults.
- When enforcement is **enabled**, `policy_for_mode` returns mode-specific configurations (e.g., `live_zone_only=True` for Subscription mode).

### Transform Entry Points

Two primary transforms handle the actual data shrinkage:

- **`SmartCrusher`** ([`headroom/transforms/smart_crusher.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/smart_crusher.py)): Performs lossless semantic compression through AST manipulation.
- **`KompressCompressor`** ([`headroom/transforms/kompress_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/kompress_compressor.py)): Provides fallback token-budget compression when semantic compression is unsuitable.

Both transforms receive the policy via `kwargs` (internally stored as `_runtime_compression_policy`) and validate specific fields:
- `live_zone_only`: Prevents mutations outside designated live zones.
- `cache_aligner_enabled`: Determines whether `CacheAligner` maintains stable cache prefixes.
- `max_lossy_ratio`: Propagates token loss limits to the Rust dispatcher (though primarily enforced on the Python side).

## Key Decision Points in the Compression Flow

Inside `SmartCrusher.apply()`, the transform executes a series of validation checks before committing to compression:

```python
policy = self._runtime_compression_policy
if policy.live_zone_only and not self._in_live_zone():
    return passthrough   # no compression for out‑of‑zone content

if policy.cache_aligner_enabled is False and self.is_cache_aligner():
    return passthrough   # skip cache‑aligner step

# Run the actual compression algorithm …

compressed, ratio = self._compress(messages)
if ratio > policy.max_lossy_ratio:
    # Too aggressive – either fallback or keep original

    return passthrough

```

When a transform decides against compression, it records the decision via `record_compression` in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py). These entries populate the request-level `routing_log`, capturing the exact `compression_ratio` and skip reason for each stage.

## Observability: Python Logs and Rust Metrics

The Rust implementation mirrors this logic in [`crates/headroom-proxy/src/observability/compression_ratio.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/observability/compression_ratio.rs). When running the `headroom-proxy` binary, the system emits the final compression ratio as a Prometheus metric (`headroom_compression_ratio`) suitable for Grafana dashboards. Discrepancies between Python logs and Rust metrics often indicate desynchronization between the policy enforcement layers.

## Step-by-Step Debugging Workflow

Follow these systematic steps to diagnose why content isn't compressing:

1. **Verify the active policy**: Import and call `resolve_policy(auth_mode)` to confirm you're using the intended mode (PAYG vs Subscription) and that enforcement flags match your expectations.

2. **Enable verbose logging**: Set `HEADROOM_LOG_LEVEL=debug` (or `trace` for maximum granularity) to capture every `record_compression` call with its computed ratio.

3. **Inspect routing logs**: After request completion, examine `response.routing_log` (JSON array). Each object contains `transform_name`, `compression_ratio`, and a `reason` field explaining passthrough decisions.

4. **Force PAYG policy**: Temporarily set `HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT=disabled` to apply the most permissive defaults. If compression suddenly occurs, your issue is policy-related.

5. **Tune thresholds**: Modify constants in [`headroom/transforms/compression_policy.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/compression_policy.py) (`_VOLATILE_TOKEN_THRESHOLD_*`, `_MAX_LOSSY_RATIO_*`) to test whether default thresholds are excessively strict for your workload.

6. **Check Rust metrics**: Query Prometheus for `headroom_compression_ratio{job="headroom-proxy"}`. Values approaching 1.0 indicate ineffective compression; mismatches with Python logs suggest implementation drift.

7. **Isolate payload size effects**: Send a minimal synthetic request followed by your production payload. This reveals whether the `volatile_token_threshold` incorrectly classifies large payloads as stable cache candidates.

## Practical Code Examples

### Example 1: Print the Active Policy for a Request

```python
from headroom.transforms.compression_policy import resolve_policy, AuthMode

policy = resolve_policy(AuthMode.SUBSCRIPTION)
print(policy)

# Output:

# CompressionPolicy(live_zone_only=True,

#                   cache_aligner_enabled=False,

#                   volatile_token_threshold=32,

#                   max_lossy_ratio=0.25,

#                   toin_read_only=True)

```

### Example 2: Enable Debug Logging and Inspect Routing Logs

```python
import os
import logging

os.environ["HEADROOM_LOG_LEVEL"] = "debug"

from headroom.proxy.handlers.openai import handler as openai_handler

# Execute request via test client

response = client.post("/v1/chat/completions", json=payload)

# Analyze the routing log attached to the response

for entry in response.routing_log:
    print(f"{entry['transform_name']}: ratio={entry['compression_ratio']:.2f}, reason={entry.get('reason')}")

```

### Example 3: Force PAYG Policy to Bypass Subscription Restrictions

```python
import os

os.environ["HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT"] = "disabled"

# All requests now use aggressive PAYG defaults regardless of auth mode

from headroom.transforms.compression_policy import resolve_policy

policy = resolve_policy(None)  # None => unclassified request

assert policy.live_zone_only is False

```

## Summary

- **Verify policy configuration** using `resolve_policy` to ensure the correct `AuthMode` and enforcement settings are active.
- **Enable debug logging** via `HEADROOM_LOG_LEVEL=debug` to capture per-transform decisions in `record_compression` within [`content_router.py`](https://github.com/chopratejas/headroom/blob/main/content_router.py).
- **Inspect `routing_log`** entries to view exact compression ratios and understand why specific transforms returned passthrough.
- **Compare Python and Rust metrics** by checking Prometheus for `headroom_compression_ratio` to detect layer desynchronization.
- **Test with policy enforcement disabled** to quickly determine if restrictions are blocking compression.
- **Adjust thresholds** in [`compression_policy.py`](https://github.com/chopratejas/headroom/blob/main/compression_policy.py) when default values are too conservative for your specific content patterns.

## Frequently Asked Questions

### Why is my content not being compressed even with aggressive settings?

Check if `live_zone_only` is enabled in your `CompressionPolicy`. If the content falls outside the live zone, both `SmartCrusher` and `KompressCompressor` return passthrough immediately. Additionally, verify that the calculated compression ratio does not exceed `max_lossy_ratio`, which causes transforms to preserve the original content to prevent excessive token loss.

### How do I distinguish between Python and Rust compression metrics?

The Python side logs individual transform decisions via `record_compression` in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py), showing per-stage ratios and skip reasons. The Rust side in [`crates/headroom-proxy/src/observability/compression_ratio.rs`](https://github.com/chopratejas/headroom/blob/main/crates/headroom-proxy/src/observability/compression_ratio.rs) emits the final `headroom_compression_ratio` metric. If Python logs show successful compression but Rust reports a ratio near 1.0, investigate whether the Rust dispatcher is decompressing the payload or bypassing the compression layer entirely.

### Can I debug compression issues without modifying source code?

Yes. Set the environment variable `HEADROOM_LOG_LEVEL=debug` to capture detailed routing logs without code changes. Additionally, use `HEADROOM_PROXY_AUTH_MODE_POLICY_ENFORCEMENT=disabled` to force the permissive PAYG policy at runtime. These flags allow full diagnostic visibility into whether issues stem from policy configuration or content characteristics.

### What does a compression ratio close to 1.0 indicate?

A ratio approaching 1.0 signifies that output size nearly equals input size, indicating minimal or no compression occurred. This typically results from content that is already optimized, a `max_lossy_ratio` threshold that is too restrictive, or the `volatile_token_threshold` classifying the payload as stable content unsuitable for mutation. Check the `routing_log` for the specific transform that produced this ratio to determine the root cause.