# How to Use the Headroom compress() API: Single-Function LLM Message Compression

> Learn to use the Headroom compress() API to simplify LLM message list compression. This guide shows how to use the single-function API for efficient processing without boilerplate code.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-03

---

**The Headroom `compress()` API provides a single entry point that shrinks LLM message lists through an internal TransformPipeline without requiring proxy configuration or boilerplate code.**

The chopratejas/headroom repository delivers an open-source Python library designed to reduce token costs in LLM applications. The `compress()` function in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py) acts as the sole public interface, automatically handling content routing, cache alignment, and model-specific compression strategies through a lazily-initialized singleton pipeline.

## How the compress() API Works Internally

When you invoke `compress()`, the function builds a **singleton `TransformPipeline`** via the internal `_get_pipeline()` helper (lines 27‑42 in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py)). This pipeline wires together three core transformation stages:

| Stage | Purpose | Implementation Location |
|-------|---------|------------------------|
| **CacheAligner** | Aligns token prefixes to ensure KV‑cache hits remain stable across multiple calls | Lines 40‑42 in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py) |
| **ContentRouter** | Detects message types (JSON, code, or plain text) and routes each to the appropriate compressor | Lines 41‑44 in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py) |
| **Kompress / SmartCrusher / CodeCompressor** | Perform actual token‑saving compression tailored to text, structured data, or source code | Invoked via `pipeline.apply()` at line 35 |

The pipeline executes atomically: your input messages pass through alignment, routing, and compression before returning a structured result containing the optimized message list and token statistics.

## Input Validation and Configuration

The API performs strict input validation before processing. If the `messages` parameter is empty or the `optimize` flag is set to `False`, the function returns the original list unchanged (lines 198‑199).

Configuration flows through the **`CompressConfig`** dataclass, which supplies sensible defaults such as skipping user messages and protecting the most recent four conversation turns. You can override any configuration field at call time using keyword arguments like `compress_user_messages`, `target_ratio`, or `protect_recent`—these values merge into the config object at lines 202‑207.

## Pipeline Execution and Event Hooks

The compression workflow follows a deterministic sequence:

1. **Hook Execution** – If you supply a `hooks` object, the pipeline triggers `pre_compress`, `compute_biases`, and `post_compress` callbacks at the appropriate stages (starting at line 214).
2. **Query Extraction** – The helper `_extract_user_query()` from [`headroom/utils.py`](https://github.com/chopratejas/headroom/blob/main/headroom/utils.py) (line 33) isolates the latest user query, allowing compressors to prioritize content relevant to the current task.
3. **Transform Application** – The pipeline’s `apply()` method receives messages, model metadata, context limits, the extracted query, bias maps, and configuration flags, returning a `CompressionResult` with transformed content.
4. **Event Emission** – After routing and compression, the system fires **pipeline extension events** (`INPUT_ROUTED`, `INPUT_COMPRESSED`), enabling integrations to inspect or modify outputs mid-flow.
5. **Result Packaging** – The final `CompressResult` dataclass aggregates the compressed messages, token counts, the calculated `compression_ratio`, and a list of applied transforms.

## Error Handling Behavior

If any exception occurs during processing, Headroom logs the failure, records a metric via `get_otel_metrics()` from [`headroom/observability.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability.py), and safely returns the original unmodified messages (lines 311‑324). This fail‑safe design ensures that production LLM calls continue uninterrupted even if compression encounters an edge case.

## Practical Code Examples

### Basic Usage with Any Provider

```python
from headroom import compress

messages = [
    {"role": "user", "content": "Explain quantum computing in simple terms."},
    {"role": "assistant", "content": "… very long answer …"},
]

# Compress for Claude Sonnet (default model)

result = compress(messages, model="claude-sonnet-4-5-20250929")

print("Compressed messages:", result.messages)
print("Tokens saved:", result.tokens_saved)
print("Compression ratio:", result.compression_ratio)

```

### Integration with Anthropic SDK

```python
from anthropic import Anthropic
from headroom import compress

client = Anthropic()
messages = [{"role": "user", "content": "Huge tool output ..."}]

compressed = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=compressed.messages,
)

```

### Integration with OpenAI SDK

```python
from openai import OpenAI
from headroom import compress

client = OpenAI()
messages = [
    {"role": "user", "content": "Analyze this data"},
    {"role": "tool", "content": "Very large JSON payload …"},
]

compressed = compress(messages, model="gpt-4o")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=compressed.messages,
)

```

### Using LiteLLM for Bedrock Models

```python
import litellm
from headroom import compress

messages = [...]  # your list of dicts

compressed = compress(messages, model="bedrock/claude-sonnet")
response = litellm.completion(model="bedrock/claude-sonnet", messages=compressed.messages)

```

### Direct HTTP Implementation

```python
import httpx
from headroom import compress

messages = [...]  # your messages

compressed = compress(messages, model="claude-sonnet-4-5-20250929")

httpx.post(
    "https://api.anthropic.com/v1/messages",
    json={"model": "claude-sonnet-4-5-20250929", "messages": compressed.messages},
)

```

### Advanced Configuration

```python
from headroom import compress

result = compress(
    messages,
    model="claude-opus-4-20250514",
    compress_user_messages=True,   # also shrink user turns

    target_ratio=0.5,              # keep roughly 50% of tokens

    protect_recent=0,              # compress everything, even the last turn

)

```

## Key Source Files

Understanding these files deepens your ability to debug and extend the `compress()` API:

- **[`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py)** – Contains the public `compress()` function, `CompressConfig`, `CompressResult`, and the lazy `_get_pipeline()` singleton factory.
- **[`headroom/transforms/__init__.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/__init__.py)** – Exports `TransformPipeline`, the orchestrator that sequences CacheAligner, ContentRouter, and concrete compressors.
- **[`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py)** – Implements the logic that determines whether to invoke Kompress, SmartCrusher, or CodeCompressor based on content type detection.
- **[`headroom/transforms/kompress_compressor.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/kompress_compressor.py)** – Houses the ML‑based text compression engine used as the default for plain‑text messages.
- **[`headroom/utils.py`](https://github.com/chopratejas/headroom/blob/main/headroom/utils.py)** – Provides `_extract_user_query`, the utility that extracts user intent to guide relevance‑aware compression.
- **[`headroom/observability.py`](https://github.com/chopratejas/headroom/blob/main/headroom/observability.py)** – Supplies `get_otel_metrics()`, enabling instrumentation of compression success rates and failure modes.

## Summary

- The **`compress()`** function in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py) provides a zero‑configuration entry point for LLM message compression.
- Internally, it constructs a **singleton `TransformPipeline`** that sequences cache alignment, content routing, and model‑specific compression.
- The API accepts standard message dictionaries and returns a **`CompressResult`** containing optimized messages plus token statistics.
- Configuration occurs through **`CompressConfig`** or direct keyword arguments, with sensible defaults protecting recent conversation turns.
- Fail‑safe error handling ensures that exceptions return the original message list, maintaining application stability.

## Frequently Asked Questions

### What happens if the compress() API fails during execution?

If any exception occurs during the compression pipeline, Headroom catches the error at lines 311‑324 in [`headroom/compress.py`](https://github.com/chopratejas/headroom/blob/main/headroom/compress.py), logs the failure, records telemetry via `get_otel_metrics()`, and returns the original unmodified messages. This design ensures your LLM calls remain functional even when compression encounters unexpected inputs.

### Can I compress user messages or only assistant/tool content?

By default, Headroom skips user messages to preserve query intent, but you can override this behavior. Pass `compress_user_messages=True` as a keyword argument to the `compress()` function, or set `protect_recent=0` to compress all turns including the most recent ones. These parameters merge into the underlying `CompressConfig` at lines 202‑207.

### How does Headroom decide which compression algorithm to apply?

The **ContentRouter** stage (implemented in [`headroom/transforms/content_router.py`](https://github.com/chopratejas/headroom/blob/main/headroom/transforms/content_router.py)) analyzes each message’s structure to detect JSON payloads, source code blocks, or plain text. Based on this classification, it delegates to **Kompress** for general text, **SmartCrusher** for structured data, or **CodeCompressor** for programming languages, ensuring format‑appropriate token reduction.

### Is proxy configuration required to use the compress() function?

No. The `compress()` API operates as a pure Python function that processes message dictionaries locally. Unlike enterprise compression solutions that require HTTP proxies or middleware, Headroom’s one‑function API performs all transformations in‑process using the `TransformPipeline`, making it compatible with serverless environments and direct SDK integrations.