# How Dexter Manages Token Usage in LLM Context Windows: A Deep Dive into the Source Code

> Explore Dexter's source code to understand how it manages LLM token usage. Learn about its 100k token threshold, heuristic estimation, and audit trail for efficient context management.

- Repository: [Virat Singh/dexter](https://github.com/virattt/dexter)
- Tags: deep-dive
- Published: 2026-02-16

---

**Dexter prevents context overflow by estimating tokens with a 3.5-character heuristic, tracking cumulative usage across calls, and automatically clearing old tool results when the context exceeds a 100,000-token threshold, while preserving a complete audit trail in a JSONL scratchpad.**

Managing token usage is critical for AI agents that interact with large language models (LLMs) over multiple turns. Dexter, an open-source AI agent framework developed by virattt, implements a sophisticated token management strategy that balances context window constraints with the need to preserve tool interaction history. This article examines how Dexter manages token usage in its context through estimation algorithms, budget enforcement, and intelligent context pruning.

## Token Estimation with the 3.5 Character Heuristic

Dexter uses a lightweight heuristic to estimate token counts without invoking expensive tokenizer APIs. In [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts), the `estimateTokens()` function assumes approximately 3.5 characters per token:

```ts
export function estimateTokens(text: string): number {
  return Math.ceil(text.length / 3.5);
}

```

This estimation runs whenever the agent needs to assess whether appending new tool results would exceed safety limits.

## Global Token Budgets and Safety Thresholds

Dexter defines three critical constants in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts) to govern context size:

- **TOKEN_BUDGET (150,000)**: Maximum tokens reserved for the final answer generation phase
- **CONTEXT_THRESHOLD (100,000)**: Trigger point for clearing old tool results from active context
- **KEEP_TOOL_USES (5)**: Number of recent tool results preserved after clearing

```ts
export const TOKEN_BUDGET = 150_000;
export const CONTEXT_THRESHOLD = 100_000;
export const KEEP_TOOL_USES = 5;

```

## Per-Run Token Accounting with TokenCounter

The `TokenCounter` class in [`src/agent/token-counter.ts`](https://github.com/virattt/dexter/blob/main/src/agent/token-counter.ts) accumulates actual token usage reported by the LLM SDK across all calls in a single run:

```ts
export class TokenCounter {
  private usage: TokenUsage = { inputTokens: 0, outputTokens: 0, totalTokens: 0 };

  add(usage?: TokenUsage) { … }
  getUsage(): TokenUsage | undefined { … }
  getTokensPerSecond(elapsedMs: number) { … }
}

```

Instantiated within the run context ([`src/agent/run-context.ts`](https://github.com/virattt/dexter/blob/main/src/agent/run-context.ts)), the counter updates after each model invocation in [`src/agent/agent.ts`](https://github.com/virattt/dexter/blob/main/src/agent/agent.ts):

```ts
const { response, usage } = await this.callModel(currentPrompt);
ctx.tokenCounter.add(usage);

```

## Context Overflow Prevention and Management

Dexter implements an Anthropic-style threshold mechanism to prevent context window overflow during iterative tool use.

### The manageContextThreshold Mechanism

Before each iteration, the agent estimates the total token count of the system prompt, user query, and all accumulated tool results:

```ts
const fullToolResults = ctx.scratchpad.getToolResults();
const estimatedContextTokens = estimateTokens(this.systemPrompt + ctx.query + fullToolResults);

```

### Selective Clearing of Tool Results

When `estimatedContextTokens` exceeds `CONTEXT_THRESHOLD` (100,000), Dexter instructs the scratchpad to remove oldest tool results while preserving the most recent `KEEP_TOOL_USES` (5) entries:

```ts
if (estimatedContextTokens > CONTEXT_THRESHOLD) {
  const clearedCount = ctx.scratchpad.clearOldestToolResults(KEEP_TOOL_USES);
  if (clearedCount > 0) {
    yield { type: 'context_cleared', clearedCount, keptCount: KEEP_TOOL_USES };
  }
}

```

This clearing affects only the in-memory context used for LLM calls. The persistent JSONL scratchpad file retains all tool interactions for debugging and final answer generation.

## Final Answer Generation with Full Context

When generating the final answer, Dexter bypasses the iterative context limits to include all tool results from the scratchpad:

```ts
const fullContext = buildFinalAnswerContext(ctx.scratchpad);
const finalPrompt = buildFinalAnswerPrompt(ctx.query, fullContext);

```

This final phase operates under the `TOKEN_BUDGET` (150,000) constraint, ensuring the complete conversation history fits within the model's context window while preserving the integrity of the agent's reasoning chain.

## Summary

- Dexter estimates token counts using a lightweight 3.5-character heuristic in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts) to avoid expensive tokenizer calls.
- Global constants `TOKEN_BUDGET` (150,000) and `CONTEXT_THRESHOLD` (100,000) define hard limits for answer generation and context pruning triggers.
- The `TokenCounter` class tracks actual LLM usage across all calls in a single run, providing metrics for performance analysis.
- When estimated context exceeds 100,000 tokens, Dexter automatically clears oldest tool results while keeping the 5 most recent, preventing context overflow while maintaining a complete audit trail in the JSONL scratchpad.

## Frequently Asked Questions

### How does Dexter estimate token count without using a tokenizer?

Dexter uses a simple character-based heuristic defined in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts) that divides the text length by 3.5 and rounds up. This approximates the average characters-per-token ratio for most LLMs without the computational overhead of invoking a tokenizer API.

### What happens when Dexter's context exceeds the 100,000 token threshold?

When the estimated total of system prompt, user query, and tool results exceeds `CONTEXT_THRESHOLD` (100,000), Dexter invokes `clearOldestToolResults(KEEP_TOOL_USES)` on the scratchpad. This removes older tool results from the active context while preserving the 5 most recent entries, yielding a `context_cleared` event for monitoring.

### Does clearing tool results from context delete them from the scratchpad file?

No. The clearing operation affects only the in-memory context used for LLM calls during the iterative phase. The persistent JSONL scratchpad file retains all tool interactions, ensuring a complete audit trail remains available for debugging and for building the final answer context.

### How does Dexter ensure the final answer generation stays within token limits?

During the final answer phase, Dexter builds a prompt using `buildFinalAnswerContext()` which includes all tool results from the scratchpad. This phase operates under the `TOKEN_BUDGET` constant (150,000 tokens), ensuring the complete conversation history fits within the model's context window while preserving reasoning integrity.