How to Implement Token Budget Management with Compaction Strategies in Agent Framework

Implement token budget management in Microsoft Agent Framework by composing compaction strategies—such as SlidingWindowStrategy or SummarizationStrategy—inside a TokenBudgetComposedStrategy that automatically enforces your token limit through sequential processing and deterministic fallback exclusion.

Token budget management prevents LLM context window overflows by trimming or summarizing conversation history before it reaches the model. Microsoft Agent Framework provides a modular compaction system in python/packages/core/agent_framework/_compaction.py where strategies mutate message annotations to exclude content or insert summaries while preserving critical system instructions. This guide demonstrates how to configure, compose, and apply these strategies using the actual source implementation.

Understanding Agent Framework Compaction Architecture

The framework represents conversations as lists of Message objects. Each message carries annotations for group identifiers (e.g., "system", "user", "tool_call") and optional token counts. Compaction strategies manipulate these annotations to reduce the total token count without altering the original message objects directly.

Message Annotations and Grouping

Before applying any strategy, messages must be annotated. The annotate_message_groups function assigns a _group identifier to each message based on its role and conversation flow. System messages receive the "system" kind, while tool interactions receive "tool_call" identifiers.

from agent_framework import annotate_message_groups, Message

messages = [
    Message(role="system", contents=["You are a helpful assistant."]),
    Message(role="user", contents=["Hello"]),
    Message(role="assistant", contents=["Hi there!"]),
]

annotate_message_groups(messages)

# Messages now contain _group annotations: ['system', 'g1', 'g2']

When a tokenizer is provided, annotate_token_counts adds a "token_count" entry to each message's annotations, enabling budget calculations via included_token_count(messages), which sums tokens for messages not marked with EXCLUDED_KEY.

The Compaction Strategy Interface

All strategies implement the callable interface defined in _compaction.py: CompactionStrategy.__call__(messages: list[Message]) -> bool. The strategy mutates message additional_properties to set EXCLUDED_KEY=True or inserts new summary messages. The TokenBudgetComposedStrategy orchestrates multiple strategies and implements a strict fallback loop (lines 1111-1133 in _compaction.py) that excludes oldest non-system groups first, then system groups if necessary.

Core Compaction Strategies Explained

Agent Framework ships with six distinct compaction strategies. Each targets specific patterns of token consumption:

  • SlidingWindowStrategy – Retains only the most recent N non-system message groups. Ideal for simple recency-based trimming.
  • SelectiveToolCallCompactionStrategy – Removes old tool-call groups while preserving the newest N tool interactions. Use this when tool chatter dominates token usage.
  • ToolResultCompactionStrategy – Collapses old tool-call groups into a concise summary message listing tool results, preserving a readable trace.
  • SummarizationStrategy – Sends message subsets to a summarizer client and replaces them with a compact semantic summary.
  • TruncationStrategy – Performs coarse-grained removal of oldest groups until a target message count is reached.
  • TokenBudgetComposedStrategy – Chains multiple strategies in order, then falls back to deterministic exclusion if the budget remains unsatisfied.

Step-by-Step Implementation Guide

Step 1: Annotate Messages with Token Counts

Initialize a tokenizer and annotate your conversation history before applying any budget constraints. The CharacterEstimatorTokenizer provides a baseline for testing, though production environments should use model-specific tokenizers.

from agent_framework import (
    CharacterEstimatorTokenizer,
    annotate_message_groups,
    annotate_token_counts,
    included_token_count,
    Message,
)

messages = [
    Message(role="system", contents=["You are a migration copilot."]),
    Message(role="user", contents=["How do I deploy to Azure?"]),
]

tokenizer = CharacterEstimatorTokenizer()
annotate_message_groups(messages)
annotate_token_counts(messages, tokenizer=tokenizer)

current_tokens = included_token_count(messages)
print(f"Current token count: {current_tokens}")

Step 2: Configure Individual Strategies

Define strategies based on your retention requirements. For example, preserve the last four conversation turns using SlidingWindowStrategy, or aggressively remove tool calls with SelectiveToolCallCompactionStrategy.

from agent_framework import (
    SlidingWindowStrategy,
    SelectiveToolCallCompactionStrategy,
)

# Keep only the 4 most recent non-system groups

window_strategy = SlidingWindowStrategy(keep_last_groups=4)

# Remove all tool-call groups except the most recent

tool_strategy = SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=1)

Step 3: Compose Strategies with TokenBudgetComposedStrategy

Wrap your strategies in a TokenBudgetComposedStrategy to enforce a hard token limit. The composer executes strategies sequentially, refreshes token counts after each step, and applies a fallback exclusion loop if the budget is still exceeded.

from agent_framework import TokenBudgetComposedStrategy

composed = TokenBudgetComposedStrategy(
    token_budget=300,
    tokenizer=tokenizer,
    strategies=[
        tool_strategy,      # First, try dropping old tool calls

        window_strategy,    # Then apply sliding window

    ],
)

Step 4: Apply Compaction to Your Conversation

Use apply_compaction to execute the strategy and retrieve the filtered message list. This function re-annotates groups, invokes the strategy, and returns project_included_messages(messages)—the exact list to send to your LLM.

from agent_framework import apply_compaction

async def trim_conversation(messages, strategy, tokenizer):
    projected = await apply_compaction(
        messages,
        strategy=strategy,
        tokenizer=tokenizer,
    )
    print(f"Tokens after compaction: {included_token_count(projected)}")
    return projected

Advanced Example: Combining Summarization and Tool-Call Compaction

For complex agent scenarios with extensive tool usage, combine SummarizationStrategy with tool compaction. This example demonstrates collapsing tool results, summarizing the remaining bulk, and finally applying a sliding window, all within a 250-token budget.

import asyncio
from typing import Any
from agent_framework import (
    CharacterEstimatorTokenizer,
    TokenBudgetComposedStrategy,
    SlidingWindowStrategy,
    SummarizationStrategy,
    SelectiveToolCallCompactionStrategy,
    annotate_message_groups,
    apply_compaction,
    included_token_count,
    Message,
    ChatResponse,
)

class SimpleSummarizer:
    async def get_response(
        self,
        messages: list[Message],
        *,
        stream: bool = False,
        options: dict[str, Any] | None = None,
        **_: Any
    ) -> ChatResponse:
        summary = f"[Summary of {len(messages)} messages]"
        return ChatResponse(
            messages=[Message(role="assistant", contents=[summary])]
        )

async def main():
    # Build synthetic history with tool calls

    messages = [Message(role="system", contents=["You are a migration copilot."])]
    for i in range(1, 6):
        messages.append(Message(role="user", contents=[f"Step {i}"]))
        messages.append(Message(
            role="assistant",
            contents=[Message.Content.from_function_call(
                call_id=f"call{i}", name="search_docs", arguments="{}"
            )]
        ))
        messages.append(Message(role="tool", contents=[f"Result {i}"]))

    tokenizer = CharacterEstimatorTokenizer()
    annotate_message_groups(messages, tokenizer=tokenizer)

    print(f"Tokens before: {included_token_count(messages)}")

    # Compose a three-phase strategy

    composed = TokenBudgetComposedStrategy(
        token_budget=250,
        tokenizer=tokenizer,
        strategies=[
            SelectiveToolCallCompactionStrategy(keep_last_tool_call_groups=0),
            SummarizationStrategy(
                client=SimpleSummarizer(),
                target_count=2,
                threshold=3,
            ),
            SlidingWindowStrategy(keep_last_groups=3),
        ],
    )

    projected = await apply_compaction(messages, strategy=composed, tokenizer=tokenizer)
    print(f"Tokens after: {included_token_count(projected)}")
    for m in projected:
        print(f"- [{m.role}] {m.text}")

if __name__ == "__main__":
    asyncio.run(main())

Integration with Agent Provider API

For automatic per-turn enforcement, embed compaction into an Agent via CompactionProvider. This provider invokes before_strategy before the model sees history and after_strategy after each turn to trim stored state.

from agent_framework import Agent, CompactionProvider, InMemoryHistoryProvider

async def run_agent():
    history = InMemoryHistoryProvider()
    
    compaction = CompactionProvider(
        before_strategy=TokenBudgetComposedStrategy(
            token_budget=400,
            tokenizer=CharacterEstimatorTokenizer(),
            strategies=[SlidingWindowStrategy(keep_last_groups=5)],
        ),
        after_strategy=SlidingWindowStrategy(keep_last_groups=10),
        history_source_id=history.source_id,
    )

    agent = Agent(
        client=your_chat_client,
        name="assistant",
        context_providers=[history, compaction],
    )

    session = agent.create_session()
    await agent.run("Explain Azure Functions vs Logic Apps", session=session)

if __name__ == "__main__":
    import asyncio
    asyncio.run(run_agent())

Key Files and Reference

Summary

  • Annotate first: Use annotate_message_groups and annotate_token_counts to prepare message metadata before compaction.
  • Choose strategies: Select SlidingWindowStrategy for recency, SelectiveToolCallCompactionStrategy for tool noise, or SummarizationStrategy for semantic compression.
  • Compose for safety: Wrap strategies in TokenBudgetComposedStrategy to guarantee budget compliance through sequential execution and deterministic fallback exclusion.
  • Automate with providers: Use CompactionProvider to enforce budgets automatically within Agent runs.

Frequently Asked Questions

How does TokenBudgetComposedStrategy handle cases where individual strategies fail to meet the budget?

TokenBudgetComposedStrategy executes composed strategies in order, refreshing token counts after each execution. If the token budget remains exceeded, it falls back to a deterministic exclusion loop (implemented in _compaction.py lines 1111-1133) that iterates through ordered groups—excluding oldest non-system groups first, then system groups if necessary—until the included_token_count satisfies the budget.

What is the difference between ToolResultCompactionStrategy and SelectiveToolCallCompactionStrategy?

SelectiveToolCallCompactionStrategy marks entire tool-call groups as excluded based on age, effectively removing them from the context entirely. ToolResultCompactionStrategy preserves the conversation structure by replacing old tool-call groups with a new summary message containing the tool results, maintaining a readable trace while freeing tokens.

Can I use custom tokenizers with the compaction system?

Yes. The annotate_token_counts function accepts any tokenizer implementing the framework's tokenizer interface. While CharacterEstimatorTokenizer provides a simple character-count fallback suitable for demos, production implementations should pass model-specific tokenizers (such as tiktoken for OpenAI models) to ensure accurate budget calculations against actual LLM token limits.

How do I preserve system messages while aggressively trimming user history?

Configure SlidingWindowStrategy with keep_last_groups set to your desired context window size; it automatically preserves all groups annotated with "system" kind. Additionally, TokenBudgetComposedStrategy's fallback exclusion loop (lines 1111-1133) explicitly excludes non-system groups before touching system messages, ensuring critical instructions remain intact until absolutely necessary.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →