How Dexter Manages Token Usage in LLM Context Windows: A Deep Dive into the Source Code
Dexter prevents context overflow by estimating tokens with a 3.5-character heuristic, tracking cumulative usage across calls, and automatically clearing old tool results when the context exceeds a 100,000-token threshold, while preserving a complete audit trail in a JSONL scratchpad.
Managing token usage is critical for AI agents that interact with large language models (LLMs) over multiple turns. Dexter, an open-source AI agent framework developed by virattt, implements a sophisticated token management strategy that balances context window constraints with the need to preserve tool interaction history. This article examines how Dexter manages token usage in its context through estimation algorithms, budget enforcement, and intelligent context pruning.
Token Estimation with the 3.5 Character Heuristic
Dexter uses a lightweight heuristic to estimate token counts without invoking expensive tokenizer APIs. In src/utils/tokens.ts, the estimateTokens() function assumes approximately 3.5 characters per token:
export function estimateTokens(text: string): number {
return Math.ceil(text.length / 3.5);
}
This estimation runs whenever the agent needs to assess whether appending new tool results would exceed safety limits.
Global Token Budgets and Safety Thresholds
Dexter defines three critical constants in src/utils/tokens.ts to govern context size:
- TOKEN_BUDGET (150,000): Maximum tokens reserved for the final answer generation phase
- CONTEXT_THRESHOLD (100,000): Trigger point for clearing old tool results from active context
- KEEP_TOOL_USES (5): Number of recent tool results preserved after clearing
export const TOKEN_BUDGET = 150_000;
export const CONTEXT_THRESHOLD = 100_000;
export const KEEP_TOOL_USES = 5;
Per-Run Token Accounting with TokenCounter
The TokenCounter class in src/agent/token-counter.ts accumulates actual token usage reported by the LLM SDK across all calls in a single run:
export class TokenCounter {
private usage: TokenUsage = { inputTokens: 0, outputTokens: 0, totalTokens: 0 };
add(usage?: TokenUsage) { … }
getUsage(): TokenUsage | undefined { … }
getTokensPerSecond(elapsedMs: number) { … }
}
Instantiated within the run context (src/agent/run-context.ts), the counter updates after each model invocation in src/agent/agent.ts:
const { response, usage } = await this.callModel(currentPrompt);
ctx.tokenCounter.add(usage);
Context Overflow Prevention and Management
Dexter implements an Anthropic-style threshold mechanism to prevent context window overflow during iterative tool use.
The manageContextThreshold Mechanism
Before each iteration, the agent estimates the total token count of the system prompt, user query, and all accumulated tool results:
const fullToolResults = ctx.scratchpad.getToolResults();
const estimatedContextTokens = estimateTokens(this.systemPrompt + ctx.query + fullToolResults);
Selective Clearing of Tool Results
When estimatedContextTokens exceeds CONTEXT_THRESHOLD (100,000), Dexter instructs the scratchpad to remove oldest tool results while preserving the most recent KEEP_TOOL_USES (5) entries:
if (estimatedContextTokens > CONTEXT_THRESHOLD) {
const clearedCount = ctx.scratchpad.clearOldestToolResults(KEEP_TOOL_USES);
if (clearedCount > 0) {
yield { type: 'context_cleared', clearedCount, keptCount: KEEP_TOOL_USES };
}
}
This clearing affects only the in-memory context used for LLM calls. The persistent JSONL scratchpad file retains all tool interactions for debugging and final answer generation.
Final Answer Generation with Full Context
When generating the final answer, Dexter bypasses the iterative context limits to include all tool results from the scratchpad:
const fullContext = buildFinalAnswerContext(ctx.scratchpad);
const finalPrompt = buildFinalAnswerPrompt(ctx.query, fullContext);
This final phase operates under the TOKEN_BUDGET (150,000) constraint, ensuring the complete conversation history fits within the model's context window while preserving the integrity of the agent's reasoning chain.
Summary
- Dexter estimates token counts using a lightweight 3.5-character heuristic in
src/utils/tokens.tsto avoid expensive tokenizer calls. - Global constants
TOKEN_BUDGET(150,000) andCONTEXT_THRESHOLD(100,000) define hard limits for answer generation and context pruning triggers. - The
TokenCounterclass tracks actual LLM usage across all calls in a single run, providing metrics for performance analysis. - When estimated context exceeds 100,000 tokens, Dexter automatically clears oldest tool results while keeping the 5 most recent, preventing context overflow while maintaining a complete audit trail in the JSONL scratchpad.
Frequently Asked Questions
How does Dexter estimate token count without using a tokenizer?
Dexter uses a simple character-based heuristic defined in src/utils/tokens.ts that divides the text length by 3.5 and rounds up. This approximates the average characters-per-token ratio for most LLMs without the computational overhead of invoking a tokenizer API.
What happens when Dexter's context exceeds the 100,000 token threshold?
When the estimated total of system prompt, user query, and tool results exceeds CONTEXT_THRESHOLD (100,000), Dexter invokes clearOldestToolResults(KEEP_TOOL_USES) on the scratchpad. This removes older tool results from the active context while preserving the 5 most recent entries, yielding a context_cleared event for monitoring.
Does clearing tool results from context delete them from the scratchpad file?
No. The clearing operation affects only the in-memory context used for LLM calls during the iterative phase. The persistent JSONL scratchpad file retains all tool interactions, ensuring a complete audit trail remains available for debugging and for building the final answer context.
How does Dexter ensure the final answer generation stays within token limits?
During the final answer phase, Dexter builds a prompt using buildFinalAnswerContext() which includes all tool results from the scratchpad. This phase operates under the TOKEN_BUDGET constant (150,000 tokens), ensuring the complete conversation history fits within the model's context window while preserving reasoning integrity.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →