deep-dive

How Dexter Manages Token Usage in LLM Context Windows: A Deep Dive into the Source Code

February 16, 2026 virattt/dexter ↗

Dexter prevents context overflow by estimating tokens with a 3.5-character heuristic, tracking cumulative usage across calls, and automatically clearing old tool results when the context exceeds a 100,000-token threshold, while preserving a complete audit trail in a JSONL scratchpad.

Managing token usage is critical for AI agents that interact with large language models (LLMs) over multiple turns. Dexter, an open-source AI agent framework developed by virattt, implements a sophisticated token management strategy that balances context window constraints with the need to preserve tool interaction history. This article examines how Dexter manages token usage in its context through estimation algorithms, budget enforcement, and intelligent context pruning.

Token Estimation with the 3.5 Character Heuristic

Dexter uses a lightweight heuristic to estimate token counts without invoking expensive tokenizer APIs. In src/utils/tokens.ts, the estimateTokens() function assumes approximately 3.5 characters per token:

export function estimateTokens(text: string): number {
  return Math.ceil(text.length / 3.5);
}

This estimation runs whenever the agent needs to assess whether appending new tool results would exceed safety limits.

Global Token Budgets and Safety Thresholds

Dexter defines three critical constants in src/utils/tokens.ts to govern context size:

TOKEN_BUDGET (150,000): Maximum tokens reserved for the final answer generation phase
CONTEXT_THRESHOLD (100,000): Trigger point for clearing old tool results from active context
KEEP_TOOL_USES (5): Number of recent tool results preserved after clearing

export const TOKEN_BUDGET = 150_000;
export const CONTEXT_THRESHOLD = 100_000;
export const KEEP_TOOL_USES = 5;

Per-Run Token Accounting with TokenCounter

The TokenCounter class in src/agent/token-counter.ts accumulates actual token usage reported by the LLM SDK across all calls in a single run:

export class TokenCounter {
  private usage: TokenUsage = { inputTokens: 0, outputTokens: 0, totalTokens: 0 };

  add(usage?: TokenUsage) { … }
  getUsage(): TokenUsage | undefined { … }
  getTokensPerSecond(elapsedMs: number) { … }
}

Instantiated within the run context (src/agent/run-context.ts), the counter updates after each model invocation in src/agent/agent.ts:

const { response, usage } = await this.callModel(currentPrompt);
ctx.tokenCounter.add(usage);

Context Overflow Prevention and Management

Dexter implements an Anthropic-style threshold mechanism to prevent context window overflow during iterative tool use.

The manageContextThreshold Mechanism

Before each iteration, the agent estimates the total token count of the system prompt, user query, and all accumulated tool results:

const fullToolResults = ctx.scratchpad.getToolResults();
const estimatedContextTokens = estimateTokens(this.systemPrompt + ctx.query + fullToolResults);

Selective Clearing of Tool Results

When estimatedContextTokens exceeds CONTEXT_THRESHOLD (100,000), Dexter instructs the scratchpad to remove oldest tool results while preserving the most recent KEEP_TOOL_USES (5) entries:

if (estimatedContextTokens > CONTEXT_THRESHOLD) {
  const clearedCount = ctx.scratchpad.clearOldestToolResults(KEEP_TOOL_USES);
  if (clearedCount > 0) {
    yield { type: 'context_cleared', clearedCount, keptCount: KEEP_TOOL_USES };
  }
}

This clearing affects only the in-memory context used for LLM calls. The persistent JSONL scratchpad file retains all tool interactions for debugging and final answer generation.

Final Answer Generation with Full Context

When generating the final answer, Dexter bypasses the iterative context limits to include all tool results from the scratchpad:

const fullContext = buildFinalAnswerContext(ctx.scratchpad);
const finalPrompt = buildFinalAnswerPrompt(ctx.query, fullContext);

This final phase operates under the TOKEN_BUDGET (150,000) constraint, ensuring the complete conversation history fits within the model's context window while preserving the integrity of the agent's reasoning chain.

Summary

Dexter estimates token counts using a lightweight 3.5-character heuristic in src/utils/tokens.ts to avoid expensive tokenizer calls.
Global constants TOKEN_BUDGET (150,000) and CONTEXT_THRESHOLD (100,000) define hard limits for answer generation and context pruning triggers.
The TokenCounter class tracks actual LLM usage across all calls in a single run, providing metrics for performance analysis.
When estimated context exceeds 100,000 tokens, Dexter automatically clears oldest tool results while keeping the 5 most recent, preventing context overflow while maintaining a complete audit trail in the JSONL scratchpad.

Frequently Asked Questions

How does Dexter estimate token count without using a tokenizer?

Dexter uses a simple character-based heuristic defined in src/utils/tokens.ts that divides the text length by 3.5 and rounds up. This approximates the average characters-per-token ratio for most LLMs without the computational overhead of invoking a tokenizer API.

What happens when Dexter's context exceeds the 100,000 token threshold?

When the estimated total of system prompt, user query, and tool results exceeds CONTEXT_THRESHOLD (100,000), Dexter invokes clearOldestToolResults(KEEP_TOOL_USES) on the scratchpad. This removes older tool results from the active context while preserving the 5 most recent entries, yielding a context_cleared event for monitoring.

Does clearing tool results from context delete them from the scratchpad file?

No. The clearing operation affects only the in-memory context used for LLM calls during the iterative phase. The persistent JSONL scratchpad file retains all tool interactions, ensuring a complete audit trail remains available for debugging and for building the final answer context.

How does Dexter ensure the final answer generation stays within token limits?

During the final answer phase, Dexter builds a prompt using buildFinalAnswerContext() which includes all tool results from the scratchpad. This phase operates under the TOKEN_BUDGET constant (150,000 tokens), ensuring the complete conversation history fits within the model's context window while preserving reasoning integrity.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how virattt/dexter works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →