# How to Implement Token Budget Management with Compaction Strategies in Agent Framework

> Master token budget management in Agent Framework using compaction strategies. Learn to enforce token limits with SlidingWindowStrategy and SummarizationStrategy for efficient processing.

- Repository: [Microsoft/agent-framework](https://github.com/microsoft/agent-framework)
- Tags: how-to-guide
- Published: 2026-04-05

---

**Implement token budget management in Microsoft Agent Framework by composing compaction strategies—such as `SlidingWindowStrategy` or `SummarizationStrategy`—inside a `TokenBudgetComposedStrategy` that automatically enforces your token limit through sequential processing and deterministic fallback exclusion.**

Token budget management prevents LLM context window overflows by trimming or summarizing conversation history before it reaches the model. Microsoft Agent Framework provides a modular compaction system in [`python/packages/core/agent_framework/_compaction.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/core/agent_framework/_compaction.py) where strategies mutate message annotations to exclude content or insert summaries while preserving critical system instructions. This guide demonstrates how to configure, compose, and apply these strategies using the actual source implementation.

## Understanding Agent Framework Compaction Architecture

The framework represents conversations as lists of **`Message`** objects. Each message carries annotations for **group identifiers** (e.g., `"system"`, `"user"`, `"tool_call"`) and optional **token counts**. Compaction strategies manipulate these annotations to reduce the total token count without altering the original message objects directly.

### Message Annotations and Grouping

Before applying any strategy, messages must be annotated. The `annotate_message_groups` function assigns a `_group` identifier to each message based on its role and conversation flow. System messages receive the `"system"` kind, while tool interactions receive `"tool_call"` identifiers.

```python
from agent_framework import annotate_message_groups, Message

messages = [
    Message(role="system", contents=["You are a helpful assistant."]),
    Message(role="user", contents=["Hello"]),
    Message(role="assistant", contents=["Hi there!"]),
]

annotate_message_groups(messages)

# Messages now contain _group annotations: ['system', 'g1', 'g2']

```

When a tokenizer is provided, `annotate_token_counts` adds a `"token_count"` entry to each message's annotations, enabling budget calculations via `included_token_count(messages)`, which sums tokens for messages not marked with `EXCLUDED_KEY`.

### The Compaction Strategy Interface

All strategies implement the callable interface defined in [`_compaction.py`](https://github.com/microsoft/agent-framework/blob/main/_compaction.py): `CompactionStrategy.__call__(messages: list[Message]) -> bool`. The strategy mutates message `additional_properties` to set `EXCLUDED_KEY=True` or inserts new summary messages. The `TokenBudgetComposedStrategy` orchestrates multiple strategies and implements a strict fallback loop (lines 1111-1133 in [`_compaction.py`](https://github.com/microsoft/agent-framework/blob/main/_compaction.py)) that excludes oldest non-system groups first, then system groups if necessary.

## Core Compaction Strategies Explained

Agent Framework ships with six distinct compaction strategies. Each targets specific patterns of token consumption:

- **`SlidingWindowStrategy`** – Retains only the most recent *N* non-system message groups. Ideal for simple recency-based trimming.
- **`SelectiveToolCallCompactionStrategy`** – Removes old tool-call groups while preserving the newest *N* tool interactions. Use this when tool chatter dominates token usage.
- **`ToolResultCompactionStrategy`** – Collapses old tool-call groups into a concise summary message listing tool results, preserving a readable trace.
- **`SummarizationStrategy`** – Sends message subsets to a summarizer client and replaces them with a compact semantic summary.
- **`TruncationStrategy`** – Performs coarse-grained removal of oldest groups until a target message count is reached.
- **`TokenBudgetComposedStrategy`** – Chains multiple strategies in order, then falls back to deterministic exclusion if the budget remains unsatisfied.

## Step-by-Step Implementation Guide

### Step 1: Annotate Messages with Token Counts

Initialize a tokenizer and annotate your conversation history before applying any budget constraints. The `CharacterEstimatorTokenizer` provides a baseline for testing, though production environments should use model-specific tokenizers.

```python
from agent_framework import (
    CharacterEstimatorTokenizer,
    annotate_message_groups,
    annotate_token_counts,
    included_token_count,
    Message,
)

messages = [
    Message(role="system", contents=["You are a migration copilot."]),
    Message(role="user", contents=["How do I deploy to Azure?"]),
]

tokenizer = CharacterEstimatorTokenizer()
annotate_message_groups(messages)
annotate_token_counts(messages, tokenizer=tokenizer)

current_tokens = included_token_count(messages)
print(f"Current token count: {current_tokens}")

```

### Step 2: Configure Individual Strategies

Define strategies based on your retention requirements. For example, preserve the last four conversation turns using `SlidingWindowStrategy`, or aggressively remove tool calls with `SelectiveToolCallCompactionStrategy`.

```python
from agent_framework import (
    SlidingWindowStrategy,
    SelectiveToolCallCompactionStrategy,
)

# Keep only the 4 most recent non-system groups

window_strategy = SlidingWindowStrategy(keep_last_groups=4)

# Remove all tool-call groups except the most recent

tool_strategy = SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=1)

```

### Step 3: Compose Strategies with TokenBudgetComposedStrategy

Wrap your strategies in a `TokenBudgetComposedStrategy` to enforce a hard token limit. The composer executes strategies sequentially, refreshes token counts after each step, and applies a fallback exclusion loop if the budget is still exceeded.

```python
from agent_framework import TokenBudgetComposedStrategy

composed = TokenBudgetComposedStrategy(
    token_budget=300,
    tokenizer=tokenizer,
    strategies=[
        tool_strategy,      # First, try dropping old tool calls

        window_strategy,    # Then apply sliding window

    ],
)

```

### Step 4: Apply Compaction to Your Conversation

Use `apply_compaction` to execute the strategy and retrieve the filtered message list. This function re-annotates groups, invokes the strategy, and returns `project_included_messages(messages)`—the exact list to send to your LLM.

```python
from agent_framework import apply_compaction

async def trim_conversation(messages, strategy, tokenizer):
    projected = await apply_compaction(
        messages,
        strategy=strategy,
        tokenizer=tokenizer,
    )
    print(f"Tokens after compaction: {included_token_count(projected)}")
    return projected

```

## Advanced Example: Combining Summarization and Tool-Call Compaction

For complex agent scenarios with extensive tool usage, combine `SummarizationStrategy` with tool compaction. This example demonstrates collapsing tool results, summarizing the remaining bulk, and finally applying a sliding window, all within a 250-token budget.

```python
import asyncio
from typing import Any
from agent_framework import (
    CharacterEstimatorTokenizer,
    TokenBudgetComposedStrategy,
    SlidingWindowStrategy,
    SummarizationStrategy,
    SelectiveToolCallCompactionStrategy,
    annotate_message_groups,
    apply_compaction,
    included_token_count,
    Message,
    ChatResponse,
)

class SimpleSummarizer:
    async def get_response(
        self,
        messages: list[Message],
        *,
        stream: bool = False,
        options: dict[str, Any] | None = None,
        **_: Any
    ) -> ChatResponse:
        summary = f"[Summary of {len(messages)} messages]"
        return ChatResponse(
            messages=[Message(role="assistant", contents=[summary])]
        )

async def main():
    # Build synthetic history with tool calls

    messages = [Message(role="system", contents=["You are a migration copilot."])]
    for i in range(1, 6):
        messages.append(Message(role="user", contents=[f"Step {i}"]))
        messages.append(Message(
            role="assistant",
            contents=[Message.Content.from_function_call(
                call_id=f"call{i}", name="search_docs", arguments="{}"
            )]
        ))
        messages.append(Message(role="tool", contents=[f"Result {i}"]))

    tokenizer = CharacterEstimatorTokenizer()
    annotate_message_groups(messages, tokenizer=tokenizer)

    print(f"Tokens before: {included_token_count(messages)}")

    # Compose a three-phase strategy

    composed = TokenBudgetComposedStrategy(
        token_budget=250,
        tokenizer=tokenizer,
        strategies=[
            SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=0),
            SummarizationStrategy(
                client=SimpleSummarizer(),
                target_count=2,
                threshold=3,
            ),
            SlidingWindowStrategy(keep_last_groups=3),
        ],
    )

    projected = await apply_compaction(messages, strategy=composed, tokenizer=tokenizer)
    print(f"Tokens after: {included_token_count(projected)}")
    for m in projected:
        print(f"- [{m.role}] {m.text}")

if __name__ == "__main__":
    asyncio.run(main())

```

## Integration with Agent Provider API

For automatic per-turn enforcement, embed compaction into an `Agent` via `CompactionProvider`. This provider invokes `before_strategy` before the model sees history and `after_strategy` after each turn to trim stored state.

```python
from agent_framework import Agent, CompactionProvider, InMemoryHistoryProvider

async def run_agent():
    history = InMemoryHistoryProvider()
    
    compaction = CompactionProvider(
        before_strategy=TokenBudgetComposedStrategy(
            token_budget=400,
            tokenizer=CharacterEstimatorTokenizer(),
            strategies=[SlidingWindowStrategy(keep_last_groups=5)],
        ),
        after_strategy=SlidingWindowStrategy(keep_last_groups=10),
        history_source_id=history.source_id,
    )

    agent = Agent(
        client=your_chat_client,
        name="assistant",
        context_providers=[history, compaction],
    )

    session = agent.create_session()
    await agent.run("Explain Azure Functions vs Logic Apps", session=session)

if __name__ == "__main__":
    import asyncio
    asyncio.run(run_agent())

```

## Key Files and Reference

- **[`python/packages/core/agent_framework/_compaction.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/core/agent_framework/_compaction.py)** – Core algorithms for annotation (`annotate_message_groups`, `annotate_token_counts`), all strategy implementations, and `TokenBudgetComposedStrategy` fallback logic (lines 1111-1133).
- **[`python/packages/core/agent_framework/__init__.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/core/agent_framework/__init__.py)** – Public API exports including `TokenBudgetComposedStrategy` and `SlidingWindowStrategy`.
- **[`python/packages/core/tests/core/test_compaction.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/core/tests/core/test_compaction.py)** – Comprehensive test suite validating strategy behavior and budget enforcement.
- **[`python/samples/02-agents/compaction/advanced.py`](https://github.com/microsoft/agent-framework/blob/main/python/samples/02-agents/compaction/advanced.py)** – End-to-end sample demonstrating multi-strategy composition with tool-call handling and summarization.
- **[`python/samples/02-agents/compaction/basics.py`](https://github.com/microsoft/agent-framework/blob/main/python/samples/02-agents/compaction/basics.py)** – Minimal example of `SlidingWindowStrategy` with `TokenBudgetComposedStrategy`.

## Summary

- **Annotate first**: Use `annotate_message_groups` and `annotate_token_counts` to prepare message metadata before compaction.
- **Choose strategies**: Select `SlidingWindowStrategy` for recency, `SelectiveToolCallCompactionStrategy` for tool noise, or `SummarizationStrategy` for semantic compression.
- **Compose for safety**: Wrap strategies in `TokenBudgetComposedStrategy` to guarantee budget compliance through sequential execution and deterministic fallback exclusion.
- **Automate with providers**: Use `CompactionProvider` to enforce budgets automatically within `Agent` runs.

## Frequently Asked Questions

### How does TokenBudgetComposedStrategy handle cases where individual strategies fail to meet the budget?

`TokenBudgetComposedStrategy` executes composed strategies in order, refreshing token counts after each execution. If the token budget remains exceeded, it falls back to a deterministic exclusion loop (implemented in [`_compaction.py`](https://github.com/microsoft/agent-framework/blob/main/_compaction.py) lines 1111-1133) that iterates through ordered groups—excluding oldest non-system groups first, then system groups if necessary—until the `included_token_count` satisfies the budget.

### What is the difference between ToolResultCompactionStrategy and SelectiveToolCallCompactionStrategy?

`SelectiveToolCallCompactionStrategy` marks entire tool-call groups as excluded based on age, effectively removing them from the context entirely. `ToolResultCompactionStrategy` preserves the conversation structure by replacing old tool-call groups with a new summary message containing the tool results, maintaining a readable trace while freeing tokens.

### Can I use custom tokenizers with the compaction system?

Yes. The `annotate_token_counts` function accepts any tokenizer implementing the framework's tokenizer interface. While `CharacterEstimatorTokenizer` provides a simple character-count fallback suitable for demos, production implementations should pass model-specific tokenizers (such as tiktoken for OpenAI models) to ensure accurate budget calculations against actual LLM token limits.

### How do I preserve system messages while aggressively trimming user history?

Configure `SlidingWindowStrategy` with `keep_last_groups` set to your desired context window size; it automatically preserves all groups annotated with `"system"` kind. Additionally, `TokenBudgetComposedStrategy`'s fallback exclusion loop (lines 1111-1133) explicitly excludes non-system groups before touching system messages, ensuring critical instructions remain intact until absolutely necessary.