Headroom Wrap vs Proxy: Understanding the Two Deployment Modes

Headroom wrap is an ephemeral CLI shim that transparently executes a single command through a temporary proxy, while headroom proxy is a persistent HTTP server that runs continuously with user-selectable optimization modes.

The chopratejas/headroom repository provides two complementary approaches for deploying its AI optimization layer. When comparing headroom wrap vs proxy, the fundamental distinction lies in their lifecycle management and intended use cases—wrap functions as a command-centered wrapper whereas proxy operates as a long-running service endpoint.

What is headroom wrap?

Headroom wrap functions as a session-oriented CLI shim designed for transient, command-specific execution. According to the specification in [docs/spec/015-interfaces.md](https://github.com/chopratejas/headroom/blob/main/docs/spec/015-interfaces.md#headroom-wrap), this mode transparently starts a temporary Headroom proxy, executes a single CLI command (such as claude, codex, or cursor), and terminates the proxy upon completion.

The wrapper automatically configures the target tool's environment variables—such as ANTHROPIC_BASE_URL or OPENAI_BASE_URL—to point to the ephemeral proxy instance. This eliminates manual configuration while ensuring compression benefits apply only for the duration of the specific task. The TypeScript SDK implements an analogous pattern in [sdk/typescript/src/adapters/vercel-ai.ts](https://github.com/chopratejas/headroom/blob/main/sdk/typescript/src/adapters/vercel-ai.ts), where the withHeadroom() utility composes wrapLanguageModel with headroomMiddleware to provide single-invocation wrapping at the SDK level.


# Wrap Claude Code for a single invocation

headroom wrap claude -- \
    --model claude-sonnet-4-20250514 \
    "Write a function that sorts a list"

What is headroom proxy?

Headroom proxy operates as a stand-alone HTTP server that provides persistent API endpoints compatible with OpenAI and Anthropic specifications. Unlike the ephemeral nature of wrap, this mode continues running until manually stopped, making it suitable for containerized deployments and multi-client scenarios.

The proxy exposes critical configuration options including the --mode flag documented in docs/content/docs/proxy.mdx, which allows selection between token mode (maximizing compression ratios) and cache mode (preserving provider-specific prefix-cache stability). Additional options include --no-cache, --memory, and worker process configuration.


# Start a long-running proxy in "token" mode (maximum compression)

headroom proxy --mode token --port 8787

# Or use "cache" mode for provider cache stability

headroom proxy --mode cache --port 8787

Headroom Wrap vs Proxy: Key Differences

Understanding the operational distinctions between these deployment modes helps determine the appropriate choice for your workflow:

Lifetime Management

  • headroom wrap: Creates ephemeral proxy processes that exist solely for the duration of the wrapped command.
  • headroom proxy: Runs as a continuous daemon until explicitly terminated or managed by a process supervisor.

Optimization Mode Control

  • headroom wrap: Inherits the optimization mode from any existing persistent deployment; does not expose the --mode flag directly.
  • headroom proxy: Explicitly supports --mode token or --mode cache configuration via the CLI interface.

Invocation Patterns

  • headroom wrap: Requires the syntax headroom wrap [OPTIONS] -- <command> [args...], acting as a prefix to existing CLI tools. Supports wrapper-specific flags like --no-context-tool and --no-rtk.
  • headroom proxy: Uses headroom proxy [OPTIONS] to start a server that clients must independently configure via environment variables like HEADROOM_BASE_URL or OPENAI_BASE_URL.

Client Concurrency

Reusing Persistent Deployments with headroom wrap

Before spawning a new ephemeral proxy, headroom wrap checks whether a matching persistent deployment already exists on the requested port. If detected, the wrapper reuses the existing server rather than creating redundant processes, as detailed in docs/content/docs/persistent-installs.mdx.

This behavior allows seamless integration between temporary CLI usage and production-grade proxy deployments. When a persistent proxy is already running, wrap simply forwards the command through the existing infrastructure while inheriting its configured optimization mode and compression settings.


# First, install a persistent proxy running in the background

headroom install apply --preset persistent-service --providers auto

# Then use wrap - it discovers and reuses the running proxy

headroom wrap cursor -- \
    "Explain the code in this file"

Summary

  • headroom wrap provides ephemeral, command-centric proxying ideal for ad-hoc CLI tasks and single invocations.
  • headroom proxy delivers persistent, configurable HTTP endpoints suitable for production workloads and multi-client scenarios.
  • The wrap command automatically detects and reuses existing proxy deployments to avoid port conflicts and redundant processes.
  • Only headroom proxy exposes the --mode flag for selecting between token (compression) and cache (stability) optimization strategies.
  • Both modes ultimately utilize the same compression engine, differing primarily in lifecycle management and deployment architecture.

Frequently Asked Questions

Can headroom wrap work with an existing headroom proxy server?

Yes. According to the implementation in [docs/spec/015-interfaces.md](https://github.com/chopratejas/headroom/blob/main/docs/spec/015-interfaces.md#headroom-wrap), headroom wrap checks for existing persistent deployments on the requested port before creating a new process. If a proxy is already running, wrap automatically routes through it and skips the startup sequence, inheriting the server's existing configuration including its selected optimization mode.

Which optimization mode should I choose for headroom proxy?

Choose --mode token when maximizing request compression is your primary objective and you want to minimize token usage. Select --mode cache when you need to preserve provider-specific prefix-cache mechanisms, such as Anthropic's prompt caching or similar vendor optimizations that rely on exact prefix matching. The mode selection is only available when running headroom proxy directly, not when using headroom wrap with ephemeral instances.

How do I configure a client to use headroom proxy?

Clients must point their base URL environment variables to the proxy endpoint. Set OPENAI_BASE_URL=http://localhost:8787/v1 for OpenAI-compatible clients or ANTHROPIC_BASE_URL=http://localhost:8787 for Anthropic tools. Unlike headroom wrap, which automatically manages these variables for a single command, persistent proxy deployments require manual client configuration as documented in [docs/spec/011-deployment.md](https://github.com/chopratejas/headroom/blob/main/docs/spec/011-deployment.md).

Is headroom wrap suitable for production deployments?

No. Headroom wrap is designed for development workflows and ad-hoc command execution where ephemeral proxy behavior is desired. For production environments requiring stability, concurrent client support, and process management, deploy headroom proxy as a persistent service using Docker, systemd, or the headroom install command with appropriate presets as described in docs/content/docs/persistent-installs.mdx.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →