In-Memory vs File-Based Checkpoint Storage in Agent Framework Workflows

Agent Framework workflows support two interchangeable checkpoint storage backends—InMemoryCheckpointStorage for ephemeral, high-speed testing and FileCheckpointStorage for durable, process-surviving persistence—both implementing the same CheckpointStorage protocol defined in the core workflow engine.

The microsoft/agent-framework repository provides robust workflow state management through checkpoints. Understanding the differences between in-memory vs file-based checkpoint storage in Agent Framework workflows allows developers to choose the right backend for testing ephemeral agents or running production orchestrations that must survive restarts.

Understanding Checkpoint Storage Backends

The checkpointing system centers on the WorkflowCheckpoint dataclass defined in python/packages/core/agent_framework/_workflows/_checkpoint.py. This data model captures workflow name, graph signature, messages, state, pending events, iteration count, metadata, and a UUID checkpoint ID. The class provides to_dict and from_dict methods (lines 90-100) for serialization, enabling both storage implementations to handle state consistently.

Both storage classes implement the CheckpointStorage protocol, exposing identical methods: save, load, list_checkpoints, delete, get_latest, and list_checkpoint_ids. This shared interface allows runtime swapping without modifying workflow logic.

In-Memory Checkpoint Storage

Implementation and Performance Characteristics

The InMemoryCheckpointStorage class (line 192 of _checkpoint.py) maintains checkpoints in a Python dictionary. When saving, it performs a deep copy of the WorkflowCheckpoint into self._checkpoints, ensuring complete isolation between stored states. All lookup operations—including load, get_latest, and list_checkpoints—execute against this in-memory hash map.

This approach delivers zero I/O overhead, making it the fastest option available. However, checkpoints exist only for the lifetime of the Python process; they disappear entirely upon exit or restart.

When to Use Volatile Storage

Choose InMemoryCheckpointStorage for:

  • Unit tests requiring fast setup and teardown without filesystem cleanup
  • Interactive demos and proof-of-concept workflows
  • Short-lived processes where state persistence is unnecessary

File-Based Checkpoint Storage

Persistence Mechanism and Security

The FileCheckpointStorage class (line 239 of _checkpoint.py) persists workflow state to disk. Each checkpoint becomes a JSON file containing metadata alongside the actual workflow state, which is pickled and base64-encoded within the JSON structure. The implementation validates checkpoint IDs against the configured directory to prevent path-traversal attacks, and writes files atomically using a .json.tmp intermediate file that renames to the final name only after successful serialization.

Production Durability and Restart Capabilities

File-based storage enables workflows to survive process restarts, making it suitable for production environments and long-running jobs. The load method reads the JSON, decodes the pickled state via _checkpoint_encoding.decode_checkpoint_value, and reconstructs the WorkflowCheckpoint object. The list_checkpoints method walks the directory, decodes each file, and filters by workflow name to return complete checkpoint histories.

Storage Configuration Examples

Configuring In-Memory Storage for Testing

The following pattern demonstrates ephemeral checkpointing using the in-memory backend:

from agent_framework import WorkflowBuilder, InMemoryCheckpointStorage

workflow = WorkflowBuilder(...).build()
checkpoint_storage = InMemoryCheckpointStorage()

await workflow.run(checkpoint_storage=checkpoint_storage, stream=True)

checkpoints = await checkpoint_storage.list_checkpoints(workflow_name=workflow.name)
print(f"Saved {len(checkpoints)} in-memory checkpoints")

This approach appears in the sample at samples/03-workflows/checkpoint/workflow_as_agent_checkpoint.py (lines 70-80).

Setting Up Persistent File-Based Storage

For production deployments requiring restart capability:

from pathlib import Path
from agent_framework import WorkflowBuilder, FileCheckpointStorage

ckpt_dir = Path("./my_checkpoints")
checkpoint_storage = FileCheckpointStorage(ckpt_dir)

workflow = WorkflowBuilder(...).build()
await workflow.run(checkpoint_storage=checkpoint_storage, stream=True)

# Resume from latest checkpoint after restart

latest = await checkpoint_storage.get_latest(workflow_name=workflow.name)
if latest:
    await workflow.run(
        checkpoint_id=latest.checkpoint_id,
        checkpoint_storage=checkpoint_storage,
        stream=True,
    )

Reference implementation appears in samples/03-workflows/orchestrations/magentic_checkpoint.py (lines 55-65). The test suite in python/packages/core/tests/workflow/test_checkpoint.py provides additional validation examples for both backends.

Summary

  • Two interchangeable backends: InMemoryCheckpointStorage (volatile, fast) and FileCheckpointStorage (persistent, durable) both implement the CheckpointStorage protocol.
  • Shared interface: Both classes provide save, load, list_checkpoints, delete, get_latest, and list_checkpoint_ids methods, enabling transparent swapping.
  • Data model: Both use the WorkflowCheckpoint dataclass defined in python/packages/core/agent_framework/_workflows/_checkpoint.py (lines 90-100) with to_dict/from_dict serialization.
  • Security considerations: File-based storage validates checkpoint IDs to prevent directory traversal and uses atomic writes to prevent corruption.
  • Performance trade-offs: In-memory storage offers zero I/O overhead but loses data on exit; file-based storage survives restarts but incurs filesystem overhead.

Frequently Asked Questions

What is the primary difference between InMemoryCheckpointStorage and FileCheckpointStorage?

InMemoryCheckpointStorage stores checkpoints in a Python dictionary that exists only during the process lifetime, providing maximum speed but no persistence. FileCheckpointStorage writes checkpoints as JSON files with embedded pickled state to a specified directory, allowing workflows to resume after process restarts.

How do I migrate a workflow from in-memory to file-based checkpoint storage?

Migration requires only changing the constructor call. Replace InMemoryCheckpointStorage() with FileCheckpointStorage(Path("./checkpoints")) and pass the same checkpoint_storage parameter to workflow.run(). Because both classes implement the identical CheckpointStorage protocol, no other workflow code changes are necessary.

Are file-based checkpoints secure against path traversal attacks?

Yes. The FileCheckpointStorage implementation validates every checkpoint ID against the configured base directory before constructing file paths, preventing escape attempts like ../../../etc/passwd. Additionally, writes use atomic temporary files (.json.tmp) that rename to final names only after successful serialization, preventing partial writes.

Can I implement a custom checkpoint storage backend?

Yes. Any class implementing the CheckpointStorage protocol methods—save, load, list_checkpoints, delete, get_latest, and list_checkpoint_ids—can serve as a backend. The protocol expects handling of WorkflowCheckpoint objects, which convert to dictionaries via to_dict() and reconstruct via from_dict().

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →