# Headroom OpenTelemetry Monitoring and Observability Metrics: Complete Reference

> Explore Headroom's comprehensive OpenTelemetry metrics for monitoring and observability. Discover detailed insights into proxy requests, token savings, latency, and more with this complete reference.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: api-reference
- Published: 2026-06-07

---

**Headroom exposes more than 28 production-grade OpenTelemetry metrics—including counters for proxy requests, token savings, and compression pipelines, histograms for end-to-end latency and per-stage duration, and observable gauges for subscription-window utilization—that are all defined in [`headroom/observability/metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py) and emitted through the `HeadroomOtelMetrics` façade.**

The `chopratejas/headroom` repository ships a dedicated OpenTelemetry (OTel) metrics façade that records production-grade telemetry for every proxy request, compression pipeline run, and subscription-window update. These monitoring and observability metrics are instantiated by `configure_otel_metrics()`—or the default `get_otel_metrics()` when OTel is disabled—and exported via standard OTLP endpoints or the console.

## Proxy Request Metrics

The proxy layer emits counters and histograms that track request volume, cache behavior, token throughput, and end-to-end latency. Each metric is created inside the `HeadroomOtelMetrics` class constructor in [`headroom/observability/metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py).

### Request Volume and Status Counters

- `headroom.proxy.requests` — Total proxy requests handled ([lines 38-41](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L38)).
- `headroom.proxy.requests.cached` — Requests served directly from a provider cache ([lines 44-46](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L44)).
- `headroom.proxy.requests.failed` — Requests that resulted in an error ([lines 48-50](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L48)).
- `headroom.proxy.requests.rate_limited` — Requests rejected by rate-limiting ([lines 52-54](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L52)).

### Token and Cache Counters

- `headroom.proxy.tokens.input` — Input tokens received by the proxy ([lines 58-60](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L58)).
- `headroom.proxy.tokens.output` — Output tokens returned by upstream providers ([lines 62-64](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L62)).
- `headroom.proxy.tokens.saved` — Tokens saved by Headroom’s compression logic ([lines 66-68](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L66)).
- `headroom.proxy.cache.read_tokens` — Provider-cache read tokens observed ([lines 70-72](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L70)).
- `headroom.proxy.cache.write_tokens` — Provider-cache write tokens observed ([lines 74-76](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L74)).
- `headroom.proxy.cache.write_ttl_tokens` — Cache-write tokens bucketed by TTL values such as `"5m"` and `"1h"` ([lines 78-84](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L78)).
- `headroom.proxy.cache.uncached_input_tokens` — Input tokens that missed the cache ([lines 86-88](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L86)).
- `headroom.proxy.cache.busts` — Requests that triggered a cache bust ([lines 90-92](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L90)).
- `headroom.proxy.cache.bust_tokens_lost` — Tokens lost because of a cache bust ([lines 94-96](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L94)).

### Latency and Duration Histograms

- `headroom.proxy.request.duration` — End-to-end request latency in seconds ([lines 98-100](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L98)).
- `headroom.proxy.overhead.duration` — Time spent inside Headroom’s optimization logic in seconds ([lines 102-104](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L102)).
- `headroom.proxy.ttfb.duration` — Time-to-first-byte from the upstream provider in seconds ([lines 106-108](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L106)).

## Compression Pipeline Metrics

The compression subsystem exposes counters for throughput and failures, histograms for stage timing, and metadata about individual transforms and waste detection. All of these live in the same [`headroom/observability/metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py) module and are surfaced via `record_pipeline_run()` and `record_compression_failure()`.

### Execution and Failure Counters

- `headroom.compression.runs` — Number of compression pipeline executions ([lines 110-112](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L110)).
- `headroom.compression.failures` — Compression attempts that failed before producing a result ([lines 114-116](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L114)).

### Token Throughput Counters

- `headroom.compression.tokens.input` — Tokens fed into the compression pipeline ([lines 118-120](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L118)).
- `headroom.compression.tokens.output` — Tokens emitted by the compression pipeline ([lines 122-124](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L122)).
- `headroom.compression.tokens.saved` — Net tokens removed by compression, calculated as input minus output ([lines 126-128](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L126)).

### Timing and Transform Histograms

- `headroom.compression.pipeline.duration` — Total time spent running the compression pipeline in seconds ([lines 130-132](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L130)).
- `headroom.compression.stage.duration` — Per-stage timing inside the pipeline in seconds ([lines 134-136](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L134)).
- `headroom.compression.transforms` — Count of each transform applied during compression, such as `smart_crusher` or `diff_compressor` ([lines 138-140](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L138)).

### Waste Detection Counters

- `headroom.compression.waste.tokens` — Tokens identified as “waste” by waste-signal detectors like `empty_lines` ([lines 142-144](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py#L142)).

## Subscription Window and Overage Gauges

Headroom provides **observable gauges** that report Anthropic-specific rate-limit and billing data. These are defined in [`headroom/observability/metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py) within the `HeadroomOtelMetrics` constructor and report values as percentages, durations, or USD.

- `headroom.subscription.5h_utilization_pct` — Anthropic 5-hour rate-limit window utilization, reported as a 0-100 percentage.
- `headroom.subscription.7d_utilization_pct` — Anthropic 7-day rate-limit window utilization, reported as a 0-100 percentage.
- `headroom.subscription.5h_seconds_to_reset` — Seconds until the 5-hour window resets.
- `headroom.subscription.7d_seconds_to_reset` — Seconds until the 7-day window resets.
- `headroom.subscription.overage_usd` — Anthropic extra-usage credits consumed, measured in USD.

## Recording Metrics in Production

The `HeadroomOtelMetrics` class exposes helper methods such as `record_proxy_request()`, `record_pipeline_run()`, and `record_compression_failure()` that increment counters and record histogram values in a single call. The following examples show how to emit telemetry from application code.

### Proxy Request Example

```python
from headroom.observability.metrics import get_otel_metrics, configure_otel_metrics

# Enable OTEL export (normally done at process start)

configure_otel_metrics()

metrics = get_otel_metrics()

# Record a successful proxy request

metrics.record_proxy_request(
    provider="openai",
    model="gpt-4o",
    input_tokens=1200,
    output_tokens=800,
    tokens_saved=400,
    latency_ms=210.5,
    cached=False,
    overhead_ms=15.2,
    ttfb_ms=85.0,
    cache_read_tokens=0,
    cache_write_tokens=0,
    cache_write_5m_tokens=0,
    cache_write_1h_tokens=0,
    uncached_input_tokens=1200,
)

```

### Compression Pipeline Example

```python

# Record a compression pipeline run

metrics.record_pipeline_run(
    model="gpt-4o-mini",
    provider="anthropic",
    tokens_before=1500,
    tokens_after=900,
    duration_ms=45.3,
    transforms_applied=["smart_crusher", "diff_compressor"],
    timing={"smart_crusher": 12.5, "diff_compressor": 30.1},
    waste_signals={"empty_lines": 10},
)

```

### Compression Failure Example

```python

# Record a compression failure

metrics.record_compression_failure(
    model="gpt-3.5-turbo",
    operation="smart_crusher",
    error_type="TimeoutError",
)

```

These methods update the counters, histograms, and gauges defined in [`headroom/observability/metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py) and export them through the configured OTLP endpoint or console.

## Enabling and Configuring the Exporter

Export behavior is controlled by environment variables read during `configure_otel_metrics()`. The key variables include:

- `HEADROOM_OTEL_METRICS_ENABLED` — toggles the OTel metrics pipeline.
- `HEADROOM_OTEL_METRICS_EXPORTER` — selects the exporter backend, such as `console` or `otlp`.
- `HEADROOM_OTEL_METRICS_ENDPOINT` — sets the target OTLP endpoint URL.

When OTel is disabled, `get_otel_metrics()` returns a no-op instance that silently discards recordings so that production code does not require branching logic.

## Core Observability Source Files

The full monitoring surface is implemented across a small set of authoritative files:

- **[`headroom/observability/metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py)** — Core OTel metrics definitions and `HeadroomOtelMetrics` helpers ([view source](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py)).
- **[`headroom/observability/tracing.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/tracing.py)** — OTel tracing façade including spans and Langfuse integration ([view source](https://github.com/chopratejas/headroom/blob/main/headroom/observability/tracing.py)).
- **[`headroom/transforms/observability.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/observability.py)** — `CompressionObserver` protocol that downstream transforms call to emit per-event data consumed by the metrics façade ([view source](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/observability.py)).
- **[`tests/test_telemetry.py`](https://github.com/chopratejas/headroom/blob/main/tests/test_telemetry.py)** — Unit tests confirming metric emission behavior for the signals above ([view source](https://github.com/chopratejas/headroom/blob/main/tests/test_telemetry.py)).

## Summary

- **Headroom** exposes more than 28 OpenTelemetry counters, histograms, and observable gauges through the `HeadroomOtelMetrics` façade in [`headroom/observability/metrics.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability/metrics.py).
- **Proxy metrics** cover request volume, cache hits, token savings, and end-to-end latency.
- **Compression metrics** capture pipeline runs, per-stage timing, transform counts, and waste-signal detection.
- **Subscription gauges** surface Anthropic rate-limit utilization and overage costs.
- **`configure_otel_metrics()`** and `get_otel_metrics()` provide a zero-friction setup that falls back to a no-op implementation when OTel is disabled.

## Frequently Asked Questions

### How do I enable OpenTelemetry metrics in Headroom?

Call `configure_otel_metrics()` at process startup and set the environment variable `HEADROOM_OTEL_METRICS_ENABLED` to a truthy value. You can also set `HEADROOM_OTEL_METRICS_EXPORTER` and `HEADROOM_OTEL_METRICS_ENDPOINT` to route data to an OTLP backend rather than the console.

### What is the difference between `headroom.proxy.tokens.saved` and `headroom.compression.tokens.saved`?

`headroom.proxy.tokens.saved` measures net tokens saved at the proxy layer after all optimizations, while `headroom.compression.tokens.saved` isolates the reduction achieved strictly inside the compression pipeline by comparing input and output token counts. The proxy metric may include savings from caching or other upstream logic, whereas the compression metric is scoped to the pipeline alone.

### Which metric tracks upstream provider latency?

The **`headroom.proxy.ttfb.duration`** histogram records time-to-first-byte from the upstream provider in seconds. For the total request lifecycle—including Headroom’s own overhead—you should also monitor **`headroom.proxy.request.duration`**.

### How does Headroom report rate-limit utilization?

Headroom exposes two observable gauges, **`headroom.subscription.5h_utilization_pct`** and **`headroom.subscription.7d_utilization_pct`**, which report Anthropic window utilization as a 0-100 percentage. Complementary gauges such as **`headroom.subscription.5h_seconds_to_reset`** and **`headroom.subscription.7d_seconds_to_reset`** tell you exactly how long remains until each window refreshes.