# Message Compaction for Optimizing Conversation History in Long-Running Flue Agents

> Optimize long-running Flue agents with message compaction. Automatically summarize conversation history, prevent memory growth, and reduce token costs for efficient AI interactions.

- Repository: [Astro/flue](https://github.com/withastro/flue)
- Tags: performance
- Published: 2026-05-11

---

**Flue's compaction layer automatically summarizes conversation history after each turn to prevent unbounded memory growth and reduce token costs.**

Long-running Flue agents maintain stateful sessions that accumulate every interaction between the user and the LLM. Without intervention, these message histories grow indefinitely, leading to increased latency, higher API costs, and potential context window exhaustion. The withastro/flue repository solves this through an intelligent **message compaction** system implemented in [`packages/sdk/src/compaction.ts`](https://github.com/withastro/flue/blob/main/packages/sdk/src/compaction.ts) that triggers after each turn to keep payloads lightweight.

## Why Message Compaction Matters for Long-Running Agents

### The Unbounded Growth Problem

Flue agents execute as stateful LLM-driven services that preserve a running log of every interaction within a **session**. When an agent operates over extended periods—whether as a persistent chatbot handling dozens of turns or a background task making periodic LLM calls—the session's message history expands without bound. Each turn adds both the user prompt and assistant response to the context, causing the payload size to grow linearly with conversation length.

### Performance and Cost Implications

Unchecked history growth creates three critical issues:

- **Increased latency** — Larger payloads require more processing time, especially for providers with per-request overhead.
- **Higher token costs** — Token-based pricing models charge for every token in the context window; unbounded histories directly inflate bills.
- **Provider limits** — Long-running agents risk hitting maximum context length restrictions imposed by LLM providers.

## How Flue Implements Message Compaction

### The Compaction Algorithm

The compaction logic in [`packages/sdk/src/compaction.ts`](https://github.com/withastro/flue/blob/main/packages/sdk/src/compaction.ts) executes a three-step strategy after each turn:

1. **Identify a stable point** — The algorithm locates the first message containing a complete, self-contained answer, typically the assistant's most recent response.
2. **Merge earlier messages** — All messages preceding the stable point fold into a single summary message that captures essential context. This summary assumes a user role in the message list.
3. **Replace the original sequence** — The system substitutes the merged history with the summary plus the most recent few messages (defaulting to approximately 5), dramatically shrinking payloads from dozens of kilobytes to a few hundred bytes while preserving semantic continuity.

### Integration with the Session Lifecycle

The compaction system hooks into Flue's **Session → Run → Turn** architecture. Every turn originates from `session.prompt` (or `session.skill`), which records the interaction via `runStore.record` in [`run-store.ts`](https://github.com/withastro/flue/blob/main/run-store.ts) / [`node/run-store.ts`](https://github.com/withastro/flue/blob/main/node/run-store.ts). After the LLM response returns, `session.prompt` invokes `compactIfNeeded()` to evaluate whether the message list exceeds the configured `maxTokens` threshold. If compaction triggers, the shortened list immediately replaces the session's stored messages, ensuring subsequent runs operate on the optimized history.

### Provider Abstraction for Summaries

Flue delegates summarization to the provider layer defined in [`packages/sdk/src/runtime/providers.ts`](https://github.com/withastro/flue/blob/main/packages/sdk/src/runtime/providers.ts). By default, the system uses the configured LLM provider (e.g., Anthropic or OpenAI) with a lightweight summarization prompt. Developers may alternatively supply a custom `summarizer` function to the compaction configuration for specialized summarization strategies or to use a cheaper model for the condensation step.

## Configuring Message Compaction in Flue

Enable compaction through the `init()` function by providing a `compaction` configuration object. The following example demonstrates setting a custom token limit and preserving the last five messages unsummarized:

```typescript
import { init } from '@flue/sdk';

// Enable compaction with a custom token limit (default ≈ 4k tokens)
const harness = init({
  model: 'anthropic/claude-sonnet-4-6',
  compaction: {
    // Compact when the total token count exceeds 3000
    maxTokens: 3000,
    // Keep the last N messages un-summarized (default 5)
    keepLast: 5,
    // Optional custom summarizer using a cheaper model
    summarizer: async (messages) => {
      const summary = await harness.provider.run({
        model: 'anthropic/claude-haiku-4-5',
        prompt: `Summarize the following conversation in <200 tokens:\n${messages.map(m => `${m.role}: ${m.content}`).join('\n')}`,
      });
      return summary;
    },
  },
});

```

For a complete agent implementation that benefits from automatic compaction, refer to [`examples/hello-world/.flue/agents/with-compaction.ts`](https://github.com/withastro/flue/blob/main/examples/hello-world/.flue/agents/with-compaction.ts):

```typescript
import { init } from '@flue/sdk';

export default async function (ctx) {
  const harness = init({
    model: 'openai/gpt-4o-mini',
    compaction: { maxTokens: 2500 },
  });

  // Simple echo loop – each user message triggers a response
  while (true) {
    const userMsg = await ctx.prompt('User says:');
    const reply = await harness.prompt({
      role: 'assistant',
      content: `You said: ${userMsg}`,
    });
    ctx.output(reply);
  }
}

```

After several dozen turns, the session's internal message list automatically compacts, maintaining responsive performance and minimizing token expenditure.

## Summary

- **Message compaction** in Flue triggers automatically after each turn when the `maxTokens` threshold is exceeded, preventing unbounded context growth in long-running agents.
- The compaction algorithm in [`packages/sdk/src/compaction.ts`](https://github.com/withastro/flue/blob/main/packages/sdk/src/compaction.ts) identifies stable points, merges historical messages into a single user-role summary, and retains the most recent messages (default 5) unsummarized.
- Configuration occurs through the `init()` function's `compaction` option, supporting custom `maxTokens` limits, `keepLast` counts, and optional custom `summarizer` functions.
- The system integrates with Flue's session lifecycle via `compactIfNeeded()` and persists compacted state through [`run-store.ts`](https://github.com/withastro/flue/blob/main/run-store.ts), reducing latency, token costs, and memory usage.

## Frequently Asked Questions

### When does compaction trigger in Flue?

Compaction triggers **after each turn** when `session.prompt` calls `compactIfNeeded()` in [`packages/sdk/src/compaction.ts`](https://github.com/withastro/flue/blob/main/packages/sdk/src/compaction.ts). The system checks whether the total token count of the current message list exceeds the configurable `maxTokens` threshold (default approximately 4,000 tokens). If the history exceeds this limit, compaction executes immediately before the next LLM call.

### How does Flue's compaction preserve conversation context?

The algorithm preserves context by identifying a **stable point**—typically the most recent complete assistant response—and collapsing only the messages preceding it into a summary. This summary captures the essential semantic content of the removed history, while the recent messages (controlled by the `keepLast` parameter, defaulting to 5) remain untouched in their original form, ensuring the LLM retains immediate conversational context.

### Can I use a custom LLM for summarization?

Yes. The `compaction` configuration accepts an optional `summarizer` function that receives the message array requiring condensation. You can implement this function to call any provider or model available through [`packages/sdk/src/runtime/providers.ts`](https://github.com/withastro/flue/blob/main/packages/sdk/src/runtime/providers.ts), such as using a smaller, faster model like Claude Haiku for cost-effective summarization while reserving the primary model for the main agent logic.

### What happens to the original messages after compaction?

After compaction completes, the original messages that were summarized are **removed from the active session state** and replaced by the single summary message. This updated message list immediately persists to the session store ([`run-store.ts`](https://github.com/withastro/flue/blob/main/run-store.ts) / [`node/run-store.ts`](https://github.com/withastro/flue/blob/main/node/run-store.ts)), meaning future turns retrieve only the compacted history. The compaction is destructive to the original message array but preserves the semantic content through the summary.