How Anthropic cache_control Reduces Token Costs by 90% in Dexter

Dexter uses Anthropic's cache_control feature to mark system prompts as ephemeral, enabling the API to cache static instructions and reduce prompt token usage by approximately 90% during iterative agent loops.

The virattt/dexter repository implements an AI agent architecture that relies heavily on repeated calls to Anthropic's Claude API. By leveraging the cache_control parameter introduced by Anthropic, Dexter optimizes the cost and performance of these repeated interactions. This article explains how the cache_control mechanism works within Dexter's codebase, where it is implemented, and the concrete benefits it provides for production deployments.

What Is Anthropic cache_control in Dexter?

Anthropic's cache_control is a API parameter that allows developers to mark specific portions of a prompt as ephemeral. When content is marked with cache_control: { type: 'ephemeral' }, Anthropic's infrastructure caches that content for a short period, allowing subsequent API calls to reference the cached data rather than re-transmitting the full text.

In Dexter, this feature is applied specifically to the system prompt—the static instructions, tool descriptions, and skill metadata that remain constant across multiple turns of an agent conversation. By marking this system prompt as ephemeral in src/model/llm.ts, Dexter ensures that the bulky static context is only sent once and then reused.

How Dexter Implements cache_control

Marking the System Prompt as Ephemeral

The core implementation resides in src/model/llm.ts between lines 178 and 189. Here, Dexter constructs the message array for Anthropic's API:

// src/model/llm.ts
function buildAnthropicMessages(systemPrompt: string, userPrompt: string) {
  return [
    {
      role: 'system',
      content: systemPrompt,
      cache_control: { type: 'ephemeral' }, // ← enables caching
    },
    { role: 'user', content: userPrompt },
  ];
}

This construction explicitly adds the cache_control field to the system message object. The type: 'ephemeral' designation tells Anthropic that this content can be cached temporarily and discarded when no longer needed.

The Agent Loop Architecture

Dexter's agent loop, implemented in src/agent/agent.ts, leverages this caching to minimize token transmission. The architecture follows this pattern:

  1. Build once: The system prompt is constructed from src/agent/prompts.ts and includes tool definitions and skill metadata.
  2. Cache: The first API call transmits the full system prompt with cache_control enabled.
  3. Iterate: Subsequent calls in the loop only transmit the dynamic user input and accumulated tool results, while Anthropic reuses the cached system prompt.
// src/agent/agent.ts (conceptual excerpt)
const system = buildSystemPrompt(); // cached forever
for (let i = 0; i < maxIterations; i++) {
  const user = getCurrentUserInput(i);
  const messages = buildAnthropicMessages(system, user);
  const reply = await llmProvider.call({ messages });
  // ... handle tool calls, update scratchpad, etc.
}

This pattern ensures that the often-large system prompt (which can contain thousands of tokens of tool descriptions) is not repeatedly sent in long-running agent sessions.

Performance Benefits and Cost Savings

The implementation of cache_control in Dexter delivers measurable improvements in both token efficiency and operational cost.

Token Usage Reduction

According to comments in src/model/llm.ts at lines 217-218, Dexter achieves approximately 90% savings on prompt tokens when cache_control is enabled. This dramatic reduction occurs because:

  • The system prompt remains static across multiple turns
  • Only the dynamic user messages and tool results are transmitted in subsequent calls
  • Anthropic's cache hit rate approaches 100% for the system prompt in iterative loops

Cost Optimization

Since Anthropic's API pricing is token-based, reducing prompt token volume directly translates to cost savings. For agent architectures like Dexter that may involve dozens of LLM calls per task, the cumulative savings are substantial. The cache_control feature effectively eliminates the cost of repeatedly sending static context, making long-running agent sessions economically viable.

Code Implementation Details

For developers looking to implement similar optimizations, here is the complete pattern used in Dexter:

Message construction with ephemeral caching:

// src/model/llm.ts
function buildAnthropicMessages(systemPrompt: string, userPrompt: string) {
  return [
    {
      role: 'system',
      content: systemPrompt,
      cache_control: { type: 'ephemeral' },
    },
    { role: 'user', content: userPrompt },
  ];
}

Integration with the LLM provider:

const finalSystemPrompt = getSystemPrompt();   // static prompt with skill metadata
const prompt = getUserPrompt();                  // dynamic user query

const messages = buildAnthropicMessages(finalSystemPrompt, prompt);
const response = await anthropicChat.call({ messages });

This implementation follows Anthropic's official guidance for cache control, as documented in Dexter's AGENTS.md at line 51, ensuring compatibility with Claude's caching mechanisms.

Summary

  • Anthropic's cache_control allows Dexter to mark system prompts as ephemeral, enabling server-side caching of static content.
  • Implementation location: The feature is implemented in src/model/llm.ts (lines 178-189) where system messages receive cache_control: { type: 'ephemeral' }.
  • Primary benefit: Approximately 90% reduction in prompt token usage during iterative agent loops, as static system prompts are cached after the first call.
  • Cost impact: Significant cost savings on Anthropic API bills due to reduced token transmission in long-running sessions.
  • Architecture: The agent loop in src/agent/agent.ts leverages this by sending dynamic user content while reusing cached system context.

Frequently Asked Questions

What does the cache_control parameter do in Dexter?

The cache_control parameter marks the system prompt as ephemeral, instructing Anthropic's API to cache this content temporarily. In Dexter, this is set via cache_control: { type: 'ephemeral' } in src/model/llm.ts, allowing the static system instructions to be reused across multiple API calls without re-transmitting the tokens.

How much token savings does cache_control provide in Dexter?

According to the source code comments in src/model/llm.ts at lines 217-218, Dexter achieves approximately 90% savings on prompt tokens when cache_control is enabled. This occurs because the often-large system prompt (containing tool descriptions and skill metadata) is only sent once and then cached, while subsequent calls only transmit dynamic user input.

Is cache_control automatically applied to all prompts in Dexter?

No, cache_control is specifically applied only to the system prompt in Dexter's Anthropic integration. The implementation in src/model/llm.ts deliberately adds the cache_control field only to the system message object, while user messages and tool results are transmitted as normal dynamic content without caching directives.

Does using cache_control affect the accuracy of Dexter's agent responses?

No, using cache_control does not affect response accuracy. The cached system prompt contains static instructions, tool definitions, and skill metadata that remain constant across iterations. Dexter still transmits dynamic content—including user queries and accumulated tool results—to maintain full conversational context. The architecture follows Anthropic's recommended patterns for cache management while preserving the agent's ability to reason across multi-turn interactions.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →