How Dexter's Scratching Management Works for LLM Context Control
Dexter uses an append-only scratchpad with automatic token threshold monitoring to manage LLM context by clearing oldest tool results in-memory while preserving a complete audit trail on disk.
Dexter's scratching management system solves the critical problem of overflowing LLM context windows during long-running agent sessions. According to the virattt/dexter source code, the system tracks every tool invocation and reasoning step in an immutable JSONL file while dynamically pruning what gets sent to the model to stay within token limits.
The Append-Only Scratchpad Architecture
At the core of Dexter's context management is the Scratchpad class defined in src/agent/scratchpad.ts. This class maintains an append-only log where every interaction is recorded as a newline-delimited JSON entry.
Entry Types and Structure
The scratchpad recognizes three distinct entry types, each serving a specific purpose in the agent's lifecycle:
init: Contains thecontentfield with the original user query that started the session.thinking: Stores free-form reasoning in thecontentfield, capturing the agent's internal monologue.tool_result: Records the complete output of tool executions withtoolName,args, andresultfields.
File Storage and Naming Convention
Physical scratchpad files reside in the .dexter/scratchpad/ directory. Each file follows a strict naming convention combining a timestamp with a hash of the query: 2026-01-21-153045_8a3f….jsonl. This ensures unique, sortable files while preventing collisions between similar queries.
Token Estimation and Context Thresholds
The agent loop in src/agent/agent.ts (lines 86-95) implements proactive token monitoring before every LLM call.
How Dexter Estimates Token Usage
Dexter uses a character-based heuristic defined in src/utils/tokens.ts. The estimateTokens function assumes approximately 3.5 characters per token, providing a fast, synchronous calculation without requiring external API calls:
const estimatedContextTokens = estimateTokens(
this.systemPrompt + ctx.query + fullToolResults
);
The CONTEXT_THRESHOLD Trigger
When the estimated token count exceeds CONTEXT_THRESHOLD (set to 100,000 tokens in src/utils/tokens.ts lines 25-30), the agent triggers an automatic context-clearing step:
if (estimatedContextTokens > CONTEXT_THRESHOLD) {
const clearedCount = ctx.scratchpad.clearOldestToolResults(KEEP_TOOL_USES);
if (clearedCount > 0) {
yield { type: 'context_cleared', clearedCount, keptCount: KEEP_TOOL_USES };
}
}
Clearing Old Tool Results
The clearOldestToolResults method in src/agent/scratchpad.ts implements a surgical approach to context reduction that preserves the audit trail while freeing up prompt space.
The clearOldestToolResults Method
This method accepts a keepCount parameter (defaulting to KEEP_TOOL_USES, set to 5 in src/utils/tokens.ts). It iterates through the JSONL entries, identifies all tool_result types, and marks the oldest ones for exclusion from the next prompt:
// Pseudocode based on source analysis
clearOldestToolResults(keepCount: number): number {
const toolIndices = this.entries
.map((e, i) => e.type === 'tool_result' ? i : -1)
.filter(i => i !== -1);
const toClear = toolIndices.slice(0, toolIndices.length - keepCount);
toClear.forEach(i => this.clearedToolIndices.add(i));
return toClear.length;
}
In-Memory vs. On-Disk Persistence
Critically, clearing affects only the in-memory view used for prompt construction. The underlying .dexter/scratchpad/ file remains immutable, containing the complete history. When retrieving tool results via getToolResults(), the system skips indices marked in clearedToolIndices and inserts placeholders like [Tool result #3 cleared from context] to maintain positional awareness for the LLM.
Final Answer Generation with Full Context
Despite aggressive context pruning during the agent loop, Dexter guarantees complete knowledge retention for the final output. When the agent determines no further tool calls are necessary, it invokes buildFinalAnswerContext(ctx.scratchpad) which calls Scratchpad.getFullContexts() (lines 38-46 in src/agent/scratchpad.ts).
This method bypasses the clearedToolIndices filter, returning all tool results including those previously excluded from intermediate prompts. This ensures the final answer synthesizes the complete session history while intermediate steps operated within token constraints.
Summary
- Append-only architecture: Every tool call and thought persists in an immutable JSONL file under
.dexter/scratchpad/. - Proactive monitoring: The agent estimates tokens using a 3.5 characters-per-token heuristic before each LLM call.
- Automatic pruning: When exceeding
CONTEXT_THRESHOLD(100,000 tokens), the system clears oldest tool results from memory while keeping the 5 most recent (KEEP_TOOL_USES). - Audit preservation: Cleared results remain on disk; only the prompt view is filtered, with placeholders indicating omissions.
- Complete synthesis: Final answer generation retrieves the full unfiltered history via
getFullContexts()to ensure comprehensive responses.
Frequently Asked Questions
How does Dexter prevent losing important tool results when clearing context?
Dexter retains all tool results in the immutable JSONL file on disk. The clearOldestToolResults method only affects the in-memory representation used for prompt construction, marking specific indices to skip during getToolResults(). For the final answer, getFullContexts() bypasses these filters entirely, ensuring the LLM receives the complete session history regardless of intermediate pruning.
What happens when the token estimate exceeds the 100,000 token threshold?
When estimateTokens returns a value greater than CONTEXT_THRESHOLD (100,000), the agent immediately invokes ctx.scratchpad.clearOldestToolResults(KEEP_TOOL_USES). This removes the oldest tool results from the next prompt until only the 5 most recent remain. The system yields a context_cleared event to notify upstream consumers that pruning has occurred, allowing UI updates or logging.
Why does Dexter use 3.5 characters per token for estimation?
The estimateTokens function in src/utils/tokens.ts uses a character-based heuristic of approximately 3.5 characters per token as a fast, synchronous approximation of LLM tokenization. This avoids the latency and complexity of calling external tokenizer APIs (like TikToken or Anthropic's tokenizer) during the tight agent loop, while providing sufficient accuracy for threshold-based context management decisions.
Can developers adjust how many tool results are kept during clearing?
Yes, developers can modify the KEEP_TOOL_USES constant defined in src/utils/tokens.ts. The default value is 5, meaning the system retains the 5 most recent tool results when clearing context. Increasing this value preserves more context history but reduces the safety margin before hitting token limits, while decreasing it frees more tokens but risks losing relevant recent context.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →