# In-Memory vs File-Based Checkpoint Storage in Agent Framework Workflows

> Explore in-memory vs file-based checkpoint storage in Agent Framework workflows. Choose high-speed testing with InMemoryCheckpointStorage or durable persistence with FileCheckpointStorage.

- Repository: [Microsoft/agent-framework](https://github.com/microsoft/agent-framework)
- Tags: performance
- Published: 2026-04-05

---

**Agent Framework workflows support two interchangeable checkpoint storage backends—`InMemoryCheckpointStorage` for ephemeral, high-speed testing and `FileCheckpointStorage` for durable, process-surviving persistence—both implementing the same `CheckpointStorage` protocol defined in the core workflow engine.**

The microsoft/agent-framework repository provides robust workflow state management through checkpoints. Understanding the differences between **in-memory vs file-based checkpoint storage in Agent Framework workflows** allows developers to choose the right backend for testing ephemeral agents or running production orchestrations that must survive restarts.

## Understanding Checkpoint Storage Backends

The checkpointing system centers on the `WorkflowCheckpoint` dataclass defined in [`python/packages/core/agent_framework/_workflows/_checkpoint.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/core/agent_framework/_workflows/_checkpoint.py). This data model captures workflow name, graph signature, messages, state, pending events, iteration count, metadata, and a UUID checkpoint ID. The class provides `to_dict` and `from_dict` methods (lines 90-100) for serialization, enabling both storage implementations to handle state consistently.

Both storage classes implement the **`CheckpointStorage` protocol**, exposing identical methods: `save`, `load`, `list_checkpoints`, `delete`, `get_latest`, and `list_checkpoint_ids`. This shared interface allows runtime swapping without modifying workflow logic.

## In-Memory Checkpoint Storage

### Implementation and Performance Characteristics

The `InMemoryCheckpointStorage` class (line 192 of [`_checkpoint.py`](https://github.com/microsoft/agent-framework/blob/main/_checkpoint.py)) maintains checkpoints in a Python dictionary. When saving, it performs a deep copy of the `WorkflowCheckpoint` into `self._checkpoints`, ensuring complete isolation between stored states. All lookup operations—including `load`, `get_latest`, and `list_checkpoints`—execute against this in-memory hash map.

This approach delivers **zero I/O overhead**, making it the fastest option available. However, checkpoints exist only for the lifetime of the Python process; they disappear entirely upon exit or restart.

### When to Use Volatile Storage

Choose `InMemoryCheckpointStorage` for:
- Unit tests requiring fast setup and teardown without filesystem cleanup
- Interactive demos and proof-of-concept workflows
- Short-lived processes where state persistence is unnecessary

## File-Based Checkpoint Storage

### Persistence Mechanism and Security

The `FileCheckpointStorage` class (line 239 of [`_checkpoint.py`](https://github.com/microsoft/agent-framework/blob/main/_checkpoint.py)) persists workflow state to disk. Each checkpoint becomes a JSON file containing metadata alongside the actual workflow state, which is pickled and base64-encoded within the JSON structure. The implementation validates checkpoint IDs against the configured directory to prevent path-traversal attacks, and writes files atomically using a `.json.tmp` intermediate file that renames to the final name only after successful serialization.

### Production Durability and Restart Capabilities

File-based storage enables workflows to survive process restarts, making it suitable for production environments and long-running jobs. The `load` method reads the JSON, decodes the pickled state via `_checkpoint_encoding.decode_checkpoint_value`, and reconstructs the `WorkflowCheckpoint` object. The `list_checkpoints` method walks the directory, decodes each file, and filters by workflow name to return complete checkpoint histories.

## Storage Configuration Examples

### Configuring In-Memory Storage for Testing

The following pattern demonstrates ephemeral checkpointing using the in-memory backend:

```python
from agent_framework import WorkflowBuilder, InMemoryCheckpointStorage

workflow = WorkflowBuilder(...).build()
checkpoint_storage = InMemoryCheckpointStorage()

await workflow.run(checkpoint_storage=checkpoint_storage, stream=True)

checkpoints = await checkpoint_storage.list_checkpoints(workflow_name=workflow.name)
print(f"Saved {len(checkpoints)} in-memory checkpoints")

```

This approach appears in the sample at [`samples/03-workflows/checkpoint/workflow_as_agent_checkpoint.py`](https://github.com/microsoft/agent-framework/blob/main/samples/03-workflows/checkpoint/workflow_as_agent_checkpoint.py) (lines 70-80).

### Setting Up Persistent File-Based Storage

For production deployments requiring restart capability:

```python
from pathlib import Path
from agent_framework import WorkflowBuilder, FileCheckpointStorage

ckpt_dir = Path("./my_checkpoints")
checkpoint_storage = FileCheckpointStorage(ckpt_dir)

workflow = WorkflowBuilder(...).build()
await workflow.run(checkpoint_storage=checkpoint_storage, stream=True)

# Resume from latest checkpoint after restart

latest = await checkpoint_storage.get_latest(workflow_name=workflow.name)
if latest:
    await workflow.run(
        checkpoint_id=latest.checkpoint_id,
        checkpoint_storage=checkpoint_storage,
        stream=True,
    )

```

Reference implementation appears in [`samples/03-workflows/orchestrations/magentic_checkpoint.py`](https://github.com/microsoft/agent-framework/blob/main/samples/03-workflows/orchestrations/magentic_checkpoint.py) (lines 55-65). The test suite in [`python/packages/core/tests/workflow/test_checkpoint.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/core/tests/workflow/test_checkpoint.py) provides additional validation examples for both backends.

## Summary

- **Two interchangeable backends**: `InMemoryCheckpointStorage` (volatile, fast) and `FileCheckpointStorage` (persistent, durable) both implement the `CheckpointStorage` protocol.
- **Shared interface**: Both classes provide `save`, `load`, `list_checkpoints`, `delete`, `get_latest`, and `list_checkpoint_ids` methods, enabling transparent swapping.
- **Data model**: Both use the `WorkflowCheckpoint` dataclass defined in [`python/packages/core/agent_framework/_workflows/_checkpoint.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/core/agent_framework/_workflows/_checkpoint.py) (lines 90-100) with `to_dict`/`from_dict` serialization.
- **Security considerations**: File-based storage validates checkpoint IDs to prevent directory traversal and uses atomic writes to prevent corruption.
- **Performance trade-offs**: In-memory storage offers zero I/O overhead but loses data on exit; file-based storage survives restarts but incurs filesystem overhead.

## Frequently Asked Questions

### What is the primary difference between InMemoryCheckpointStorage and FileCheckpointStorage?

`InMemoryCheckpointStorage` stores checkpoints in a Python dictionary that exists only during the process lifetime, providing maximum speed but no persistence. `FileCheckpointStorage` writes checkpoints as JSON files with embedded pickled state to a specified directory, allowing workflows to resume after process restarts.

### How do I migrate a workflow from in-memory to file-based checkpoint storage?

Migration requires only changing the constructor call. Replace `InMemoryCheckpointStorage()` with `FileCheckpointStorage(Path("./checkpoints"))` and pass the same `checkpoint_storage` parameter to `workflow.run()`. Because both classes implement the identical `CheckpointStorage` protocol, no other workflow code changes are necessary.

### Are file-based checkpoints secure against path traversal attacks?

Yes. The `FileCheckpointStorage` implementation validates every checkpoint ID against the configured base directory before constructing file paths, preventing escape attempts like `../../../etc/passwd`. Additionally, writes use atomic temporary files (`.json.tmp`) that rename to final names only after successful serialization, preventing partial writes.

### Can I implement a custom checkpoint storage backend?

Yes. Any class implementing the `CheckpointStorage` protocol methods—`save`, `load`, `list_checkpoints`, `delete`, `get_latest`, and `list_checkpoint_ids`—can serve as a backend. The protocol expects handling of `WorkflowCheckpoint` objects, which convert to dictionaries via `to_dict()` and reconstruct via `from_dict()`.