# How Dexter's Scratching Management Works for LLM Context Control

> Dexter's scratching management for LLM context control uses an append-only scratchpad and token monitoring to clear old tool results while saving an audit trail.

- Repository: [Virat Singh/dexter](https://github.com/virattt/dexter)
- Tags: internals
- Published: 2026-02-16

---

**Dexter uses an append-only scratchpad with automatic token threshold monitoring to manage LLM context by clearing oldest tool results in-memory while preserving a complete audit trail on disk.**

Dexter's scratching management system solves the critical problem of overflowing LLM context windows during long-running agent sessions. According to the `virattt/dexter` source code, the system tracks every tool invocation and reasoning step in an immutable JSONL file while dynamically pruning what gets sent to the model to stay within token limits.

## The Append-Only Scratchpad Architecture

At the core of Dexter's context management is the `Scratchpad` class defined in [`src/agent/scratchpad.ts`](https://github.com/virattt/dexter/blob/main/src/agent/scratchpad.ts). This class maintains an append-only log where every interaction is recorded as a newline-delimited JSON entry.

### Entry Types and Structure

The scratchpad recognizes three distinct entry types, each serving a specific purpose in the agent's lifecycle:

- **`init`**: Contains the `content` field with the original user query that started the session.
- **`thinking`**: Stores free-form reasoning in the `content` field, capturing the agent's internal monologue.
- **`tool_result`**: Records the complete output of tool executions with `toolName`, `args`, and `result` fields.

### File Storage and Naming Convention

Physical scratchpad files reside in the `.dexter/scratchpad/` directory. Each file follows a strict naming convention combining a timestamp with a hash of the query: `2026-01-21-153045_8a3f….jsonl`. This ensures unique, sortable files while preventing collisions between similar queries.

## Token Estimation and Context Thresholds

The agent loop in [`src/agent/agent.ts`](https://github.com/virattt/dexter/blob/main/src/agent/agent.ts) (lines 86-95) implements proactive token monitoring before every LLM call.

### How Dexter Estimates Token Usage

Dexter uses a character-based heuristic defined in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts). The `estimateTokens` function assumes approximately **3.5 characters per token**, providing a fast, synchronous calculation without requiring external API calls:

```typescript
const estimatedContextTokens = estimateTokens(
  this.systemPrompt + ctx.query + fullToolResults
);

```

### The CONTEXT_THRESHOLD Trigger

When the estimated token count exceeds `CONTEXT_THRESHOLD` (set to **100,000 tokens** in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts) lines 25-30), the agent triggers an automatic context-clearing step:

```typescript
if (estimatedContextTokens > CONTEXT_THRESHOLD) {
  const clearedCount = ctx.scratchpad.clearOldestToolResults(KEEP_TOOL_USES);
  if (clearedCount > 0) {
    yield { type: 'context_cleared', clearedCount, keptCount: KEEP_TOOL_USES };
  }
}

```

## Clearing Old Tool Results

The `clearOldestToolResults` method in [`src/agent/scratchpad.ts`](https://github.com/virattt/dexter/blob/main/src/agent/scratchpad.ts) implements a surgical approach to context reduction that preserves the audit trail while freeing up prompt space.

### The clearOldestToolResults Method

This method accepts a `keepCount` parameter (defaulting to `KEEP_TOOL_USES`, set to **5** in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts)). It iterates through the JSONL entries, identifies all `tool_result` types, and marks the oldest ones for exclusion from the next prompt:

```typescript
// Pseudocode based on source analysis
clearOldestToolResults(keepCount: number): number {
  const toolIndices = this.entries
    .map((e, i) => e.type === 'tool_result' ? i : -1)
    .filter(i => i !== -1);
  
  const toClear = toolIndices.slice(0, toolIndices.length - keepCount);
  toClear.forEach(i => this.clearedToolIndices.add(i));
  return toClear.length;
}

```

### In-Memory vs. On-Disk Persistence

Critically, clearing affects only the **in-memory view** used for prompt construction. The underlying `.dexter/scratchpad/` file remains immutable, containing the complete history. When retrieving tool results via `getToolResults()`, the system skips indices marked in `clearedToolIndices` and inserts placeholders like `[Tool result #3 cleared from context]` to maintain positional awareness for the LLM.

## Final Answer Generation with Full Context

Despite aggressive context pruning during the agent loop, Dexter guarantees complete knowledge retention for the final output. When the agent determines no further tool calls are necessary, it invokes `buildFinalAnswerContext(ctx.scratchpad)` which calls `Scratchpad.getFullContexts()` (lines 38-46 in [`src/agent/scratchpad.ts`](https://github.com/virattt/dexter/blob/main/src/agent/scratchpad.ts)).

This method bypasses the `clearedToolIndices` filter, returning **all** tool results including those previously excluded from intermediate prompts. This ensures the final answer synthesizes the complete session history while intermediate steps operated within token constraints.

## Summary

- **Append-only architecture**: Every tool call and thought persists in an immutable JSONL file under `.dexter/scratchpad/`.
- **Proactive monitoring**: The agent estimates tokens using a 3.5 characters-per-token heuristic before each LLM call.
- **Automatic pruning**: When exceeding `CONTEXT_THRESHOLD` (100,000 tokens), the system clears oldest tool results from memory while keeping the 5 most recent (`KEEP_TOOL_USES`).
- **Audit preservation**: Cleared results remain on disk; only the prompt view is filtered, with placeholders indicating omissions.
- **Complete synthesis**: Final answer generation retrieves the full unfiltered history via `getFullContexts()` to ensure comprehensive responses.

## Frequently Asked Questions

### How does Dexter prevent losing important tool results when clearing context?

Dexter retains all tool results in the immutable JSONL file on disk. The `clearOldestToolResults` method only affects the in-memory representation used for prompt construction, marking specific indices to skip during `getToolResults()`. For the final answer, `getFullContexts()` bypasses these filters entirely, ensuring the LLM receives the complete session history regardless of intermediate pruning.

### What happens when the token estimate exceeds the 100,000 token threshold?

When `estimateTokens` returns a value greater than `CONTEXT_THRESHOLD` (100,000), the agent immediately invokes `ctx.scratchpad.clearOldestToolResults(KEEP_TOOL_USES)`. This removes the oldest tool results from the next prompt until only the 5 most recent remain. The system yields a `context_cleared` event to notify upstream consumers that pruning has occurred, allowing UI updates or logging.

### Why does Dexter use 3.5 characters per token for estimation?

The `estimateTokens` function in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts) uses a character-based heuristic of approximately 3.5 characters per token as a fast, synchronous approximation of LLM tokenization. This avoids the latency and complexity of calling external tokenizer APIs (like TikToken or Anthropic's tokenizer) during the tight agent loop, while providing sufficient accuracy for threshold-based context management decisions.

### Can developers adjust how many tool results are kept during clearing?

Yes, developers can modify the `KEEP_TOOL_USES` constant defined in [`src/utils/tokens.ts`](https://github.com/virattt/dexter/blob/main/src/utils/tokens.ts). The default value is 5, meaning the system retains the 5 most recent tool results when clearing context. Increasing this value preserves more context history but reduces the safety margin before hitting token limits, while decreasing it frees more tokens but risks losing relevant recent context.