# How Anthropic cache_control Reduces Token Costs by 90% in Dexter

> Discover how Anthropic cache_control in Dexter slashes token costs by 90%. Learn to cache static instructions and optimize AI agent loops efficiently.

- Repository: [Virat Singh/dexter](https://github.com/virattt/dexter)
- Tags: performance
- Published: 2026-02-16

---

**Dexter uses Anthropic's `cache_control` feature to mark system prompts as ephemeral, enabling the API to cache static instructions and reduce prompt token usage by approximately 90% during iterative agent loops.**

The `virattt/dexter` repository implements an AI agent architecture that relies heavily on repeated calls to Anthropic's Claude API. By leveraging the `cache_control` parameter introduced by Anthropic, Dexter optimizes the cost and performance of these repeated interactions. This article explains how the `cache_control` mechanism works within Dexter's codebase, where it is implemented, and the concrete benefits it provides for production deployments.

## What Is Anthropic cache_control in Dexter?

Anthropic's `cache_control` is a API parameter that allows developers to mark specific portions of a prompt as **ephemeral**. When content is marked with `cache_control: { type: 'ephemeral' }`, Anthropic's infrastructure caches that content for a short period, allowing subsequent API calls to reference the cached data rather than re-transmitting the full text.

In Dexter, this feature is applied specifically to the **system prompt**—the static instructions, tool descriptions, and skill metadata that remain constant across multiple turns of an agent conversation. By marking this system prompt as ephemeral in [`src/model/llm.ts`](https://github.com/virattt/dexter/blob/main/src/model/llm.ts), Dexter ensures that the bulky static context is only sent once and then reused.

## How Dexter Implements cache_control

### Marking the System Prompt as Ephemeral

The core implementation resides in [`src/model/llm.ts`](https://github.com/virattt/dexter/blob/main/src/model/llm.ts) between lines 178 and 189. Here, Dexter constructs the message array for Anthropic's API:

```typescript
// src/model/llm.ts
function buildAnthropicMessages(systemPrompt: string, userPrompt: string) {
  return [
    {
      role: 'system',
      content: systemPrompt,
      cache_control: { type: 'ephemeral' }, // ← enables caching
    },
    { role: 'user', content: userPrompt },
  ];
}

```

This construction explicitly adds the `cache_control` field to the system message object. The `type: 'ephemeral'` designation tells Anthropic that this content can be cached temporarily and discarded when no longer needed.

### The Agent Loop Architecture

Dexter's agent loop, implemented in [`src/agent/agent.ts`](https://github.com/virattt/dexter/blob/main/src/agent/agent.ts), leverages this caching to minimize token transmission. The architecture follows this pattern:

1. **Build once**: The system prompt is constructed from [`src/agent/prompts.ts`](https://github.com/virattt/dexter/blob/main/src/agent/prompts.ts) and includes tool definitions and skill metadata.
2. **Cache**: The first API call transmits the full system prompt with `cache_control` enabled.
3. **Iterate**: Subsequent calls in the loop only transmit the dynamic user input and accumulated tool results, while Anthropic reuses the cached system prompt.

```typescript
// src/agent/agent.ts (conceptual excerpt)
const system = buildSystemPrompt(); // cached forever
for (let i = 0; i < maxIterations; i++) {
  const user = getCurrentUserInput(i);
  const messages = buildAnthropicMessages(system, user);
  const reply = await llmProvider.call({ messages });
  // ... handle tool calls, update scratchpad, etc.
}

```

This pattern ensures that the often-large system prompt (which can contain thousands of tokens of tool descriptions) is not repeatedly sent in long-running agent sessions.

## Performance Benefits and Cost Savings

The implementation of `cache_control` in Dexter delivers measurable improvements in both token efficiency and operational cost.

### Token Usage Reduction

According to comments in [`src/model/llm.ts`](https://github.com/virattt/dexter/blob/main/src/model/llm.ts) at lines 217-218, Dexter achieves approximately **90% savings** on prompt tokens when `cache_control` is enabled. This dramatic reduction occurs because:

- The system prompt remains static across multiple turns
- Only the dynamic user messages and tool results are transmitted in subsequent calls
- Anthropic's cache hit rate approaches 100% for the system prompt in iterative loops

### Cost Optimization

Since Anthropic's API pricing is token-based, reducing prompt token volume directly translates to cost savings. For agent architectures like Dexter that may involve dozens of LLM calls per task, the cumulative savings are substantial. The `cache_control` feature effectively eliminates the cost of repeatedly sending static context, making long-running agent sessions economically viable.

## Code Implementation Details

For developers looking to implement similar optimizations, here is the complete pattern used in Dexter:

**Message construction with ephemeral caching:**

```typescript
// src/model/llm.ts
function buildAnthropicMessages(systemPrompt: string, userPrompt: string) {
  return [
    {
      role: 'system',
      content: systemPrompt,
      cache_control: { type: 'ephemeral' },
    },
    { role: 'user', content: userPrompt },
  ];
}

```

**Integration with the LLM provider:**

```typescript
const finalSystemPrompt = getSystemPrompt();   // static prompt with skill metadata
const prompt = getUserPrompt();                  // dynamic user query

const messages = buildAnthropicMessages(finalSystemPrompt, prompt);
const response = await anthropicChat.call({ messages });

```

This implementation follows Anthropic's official guidance for cache control, as documented in Dexter's [`AGENTS.md`](https://github.com/virattt/dexter/blob/main/AGENTS.md) at line 51, ensuring compatibility with Claude's caching mechanisms.

## Summary

- **Anthropic's `cache_control`** allows Dexter to mark system prompts as ephemeral, enabling server-side caching of static content.
- **Implementation location**: The feature is implemented in [`src/model/llm.ts`](https://github.com/virattt/dexter/blob/main/src/model/llm.ts) (lines 178-189) where system messages receive `cache_control: { type: 'ephemeral' }`.
- **Primary benefit**: Approximately **90% reduction** in prompt token usage during iterative agent loops, as static system prompts are cached after the first call.
- **Cost impact**: Significant cost savings on Anthropic API bills due to reduced token transmission in long-running sessions.
- **Architecture**: The agent loop in [`src/agent/agent.ts`](https://github.com/virattt/dexter/blob/main/src/agent/agent.ts) leverages this by sending dynamic user content while reusing cached system context.

## Frequently Asked Questions

### What does the `cache_control` parameter do in Dexter?

The `cache_control` parameter marks the system prompt as **ephemeral**, instructing Anthropic's API to cache this content temporarily. In Dexter, this is set via `cache_control: { type: 'ephemeral' }` in [`src/model/llm.ts`](https://github.com/virattt/dexter/blob/main/src/model/llm.ts), allowing the static system instructions to be reused across multiple API calls without re-transmitting the tokens.

### How much token savings does `cache_control` provide in Dexter?

According to the source code comments in [`src/model/llm.ts`](https://github.com/virattt/dexter/blob/main/src/model/llm.ts) at lines 217-218, Dexter achieves approximately **90% savings** on prompt tokens when `cache_control` is enabled. This occurs because the often-large system prompt (containing tool descriptions and skill metadata) is only sent once and then cached, while subsequent calls only transmit dynamic user input.

### Is `cache_control` automatically applied to all prompts in Dexter?

No, `cache_control` is specifically applied only to the **system prompt** in Dexter's Anthropic integration. The implementation in [`src/model/llm.ts`](https://github.com/virattt/dexter/blob/main/src/model/llm.ts) deliberately adds the `cache_control` field only to the system message object, while user messages and tool results are transmitted as normal dynamic content without caching directives.

### Does using `cache_control` affect the accuracy of Dexter's agent responses?

No, using `cache_control` does not affect response accuracy. The cached system prompt contains static instructions, tool definitions, and skill metadata that remain constant across iterations. Dexter still transmits dynamic content—including user queries and accumulated tool results—to maintain full conversational context. The architecture follows Anthropic's recommended patterns for cache management while preserving the agent's ability to reason across multi-turn interactions.