# Headroom Wrap vs Proxy: Understanding the Two Deployment Modes

> Confused by headroom wrap vs proxy deployment? Learn the key distinctions between these two modes to choose the right one for your needs. Understand headroom wrap's temporary shim and headroom proxy's persistent server.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: deep-dive
- Published: 2026-06-08

---

**Headroom wrap is an ephemeral CLI shim that transparently executes a single command through a temporary proxy, while headroom proxy is a persistent HTTP server that runs continuously with user-selectable optimization modes.**

The `chopratejas/headroom` repository provides two complementary approaches for deploying its AI optimization layer. When comparing **headroom wrap vs proxy**, the fundamental distinction lies in their lifecycle management and intended use cases—wrap functions as a command-centered wrapper whereas proxy operates as a long-running service endpoint.

## What is headroom wrap?

**Headroom wrap** functions as a session-oriented CLI shim designed for transient, command-specific execution. According to the specification in [[`docs/spec/015-interfaces.md`](https://github.com/chopratejas/headroom/blob/main/docs/spec/015-interfaces.md)](https://github.com/chopratejas/headroom/blob/main/docs/spec/015-interfaces.md#headroom-wrap), this mode transparently starts a temporary Headroom proxy, executes a single CLI command (such as `claude`, `codex`, or `cursor`), and terminates the proxy upon completion.

The wrapper automatically configures the target tool's environment variables—such as `ANTHROPIC_BASE_URL` or `OPENAI_BASE_URL`—to point to the ephemeral proxy instance. This eliminates manual configuration while ensuring compression benefits apply only for the duration of the specific task. The TypeScript SDK implements an analogous pattern in [[`sdk/typescript/src/adapters/vercel-ai.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts)](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts), where the `withHeadroom()` utility composes `wrapLanguageModel` with `headroomMiddleware` to provide single-invocation wrapping at the SDK level.

```bash

# Wrap Claude Code for a single invocation

headroom wrap claude -- \
    --model claude-sonnet-4-20250514 \
    "Write a function that sorts a list"

```

## What is headroom proxy?

**Headroom proxy** operates as a stand-alone HTTP server that provides persistent API endpoints compatible with OpenAI and Anthropic specifications. Unlike the ephemeral nature of wrap, this mode continues running until manually stopped, making it suitable for containerized deployments and multi-client scenarios.

The proxy exposes critical configuration options including the `--mode` flag documented in [`docs/content/docs/proxy.mdx`](https://github.com/chopratejas/headroom/blob/main/docs/content/docs/proxy.mdx#--mode), which allows selection between **`token`** mode (maximizing compression ratios) and **`cache`** mode (preserving provider-specific prefix-cache stability). Additional options include `--no-cache`, `--memory`, and worker process configuration.

```bash

# Start a long-running proxy in "token" mode (maximum compression)

headroom proxy --mode token --port 8787

# Or use "cache" mode for provider cache stability

headroom proxy --mode cache --port 8787

```

## Headroom Wrap vs Proxy: Key Differences

Understanding the operational distinctions between these deployment modes helps determine the appropriate choice for your workflow:

**Lifetime Management**
- **headroom wrap**: Creates ephemeral proxy processes that exist solely for the duration of the wrapped command.
- **headroom proxy**: Runs as a continuous daemon until explicitly terminated or managed by a process supervisor.

**Optimization Mode Control**
- **headroom wrap**: Inherits the optimization mode from any existing persistent deployment; does not expose the `--mode` flag directly.
- **headroom proxy**: Explicitly supports `--mode token` or `--mode cache` configuration via the CLI interface.

**Invocation Patterns**
- **headroom wrap**: Requires the syntax `headroom wrap [OPTIONS] -- <command> [args...]`, acting as a prefix to existing CLI tools. Supports wrapper-specific flags like `--no-context-tool` and `--no-rtk`.
- **headroom proxy**: Uses `headroom proxy [OPTIONS]` to start a server that clients must independently configure via environment variables like `HEADROOM_BASE_URL` or `OPENAI_BASE_URL`.

**Client Concurrency**
- **headroom wrap**: Optimized for single-user, single-command scenarios with immediate cleanup.
- **headroom proxy**: Handles concurrent connections from multiple clients using the `HeadroomClient` class defined in [[`sdk/typescript/src/client.ts`](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts)](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/client.ts).

## Reusing Persistent Deployments with headroom wrap

Before spawning a new ephemeral proxy, `headroom wrap` checks whether a matching persistent deployment already exists on the requested port. If detected, the wrapper reuses the existing server rather than creating redundant processes, as detailed in [`docs/content/docs/persistent-installs.mdx`](https://github.com/chopratejas/headroom/blob/main/docs/content/docs/persistent-installs.mdx).

This behavior allows seamless integration between temporary CLI usage and production-grade proxy deployments. When a persistent proxy is already running, `wrap` simply forwards the command through the existing infrastructure while inheriting its configured optimization mode and compression settings.

```bash

# First, install a persistent proxy running in the background

headroom install apply --preset persistent-service --providers auto

# Then use wrap - it discovers and reuses the running proxy

headroom wrap cursor -- \
    "Explain the code in this file"

```

## Summary

- **headroom wrap** provides ephemeral, command-centric proxying ideal for ad-hoc CLI tasks and single invocations.
- **headroom proxy** delivers persistent, configurable HTTP endpoints suitable for production workloads and multi-client scenarios.
- The **wrap** command automatically detects and reuses existing **proxy** deployments to avoid port conflicts and redundant processes.
- Only **headroom proxy** exposes the **`--mode`** flag for selecting between `token` (compression) and `cache` (stability) optimization strategies.
- Both modes ultimately utilize the same compression engine, differing primarily in lifecycle management and deployment architecture.

## Frequently Asked Questions

### Can headroom wrap work with an existing headroom proxy server?

Yes. According to the implementation in [[`docs/spec/015-interfaces.md`](https://github.com/chopratejas/headroom/blob/main/docs/spec/015-interfaces.md)](https://github.com/chopratejas/headroom/blob/main/docs/spec/015-interfaces.md#headroom-wrap), `headroom wrap` checks for existing persistent deployments on the requested port before creating a new process. If a proxy is already running, wrap automatically routes through it and skips the startup sequence, inheriting the server's existing configuration including its selected optimization mode.

### Which optimization mode should I choose for headroom proxy?

Choose **`--mode token`** when maximizing request compression is your primary objective and you want to minimize token usage. Select **`--mode cache`** when you need to preserve provider-specific prefix-cache mechanisms, such as Anthropic's prompt caching or similar vendor optimizations that rely on exact prefix matching. The mode selection is only available when running `headroom proxy` directly, not when using `headroom wrap` with ephemeral instances.

### How do I configure a client to use headroom proxy?

Clients must point their base URL environment variables to the proxy endpoint. Set `OPENAI_BASE_URL=http://localhost:8787/v1` for OpenAI-compatible clients or `ANTHROPIC_BASE_URL=http://localhost:8787` for Anthropic tools. Unlike `headroom wrap`, which automatically manages these variables for a single command, persistent proxy deployments require manual client configuration as documented in [[`docs/spec/011-deployment.md`](https://github.com/chopratejas/headroom/blob/main/docs/spec/011-deployment.md)](https://github.com/chopratejas/headroom/blob/main/docs/spec/011-deployment.md).

### Is headroom wrap suitable for production deployments?

No. **Headroom wrap** is designed for development workflows and ad-hoc command execution where ephemeral proxy behavior is desired. For production environments requiring stability, concurrent client support, and process management, deploy `headroom proxy` as a persistent service using Docker, systemd, or the `headroom install` command with appropriate presets as described in [`docs/content/docs/persistent-installs.mdx`](https://github.com/chopratejas/headroom/blob/main/docs/content/docs/persistent-installs.mdx).