# How to Configure the Hierarchical Memory System for Long-Running Agent Sessions in Headroom

> Learn to configure Headroom's hierarchical memory system for long-running agent sessions. Set up persistent storage, a vector index, and embedder for stable, extended agent interactions. Optimize your agent's memory management.

- Repository: [Tejas Chopra/headroom](https://github.com/chopratejas/headroom)
- Tags: how-to-guide
- Published: 2026-06-09

---

**To configure the hierarchical memory system in Headroom for long-running agent sessions, create a `MemoryConfig` dataclass with persistent storage, a vector index, and an embedder backend, then pass it to the `with_memory` wrapper along with a stable `session_id` and optional `agent_id`.**

Headroom implements its memory layer as a hierarchical, plug-in-based system defined in [`headroom/memory/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/config.py) and orchestrated through [`headroom/memory/core.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/core.py). By tuning the `MemoryConfig` dataclass and scoping every read and write to a stable `session_id`, you can build long-running agent sessions that retain context across arbitrarily many interactions and process restarts.

## Define the Memory Hierarchy in MemoryConfig

The `MemoryConfig` dataclass in [`headroom/memory/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/config.py) controls persistence, vector indexing, full-text search, embedding, caching, and bubbling. The following configuration is tuned for a long-running agent that must survive restarts and bound its memory usage.

```python
from pathlib import Path
from headroom.memory.config import MemoryConfig, StoreBackend, VectorBackend, TextBackend, EmbedderBackend

memory_cfg = MemoryConfig(
    store_backend=StoreBackend.SQLITE,
    db_path=Path("./my_agent_memory.db"),
    vector_backend=VectorBackend.AUTO,
    vector_dimension=384,
    vector_cache_size_kb=8192,
    text_backend=TextBackend.FTS5,
    embedder_backend=EmbedderBackend.LOCAL,
    embedder_model="sentence-transformers/all-MiniLM-L6-v2",
    cache_enabled=True,
    cache_max_size=2000,
    auto_bubble=True,
    bubble_threshold=0.75,
)

```

Key fields for long-running sessions include:

- **`store_backend`** – Set to `StoreBackend.SQLITE` to write memories to disk at `db_path`. This ensures state survives process restarts.
- **`vector_backend`** – `VectorBackend.AUTO` selects SQLite-Vec if the optional dependency is installed, otherwise falls back to HNSW. SQLite-Vec maintains a bounded on-disk index, while HNSW grows unbounded unless `hnsw_max_entries` is set.
- **`cache_enabled` and `cache_max_size`** – When `True`, Headroom keeps the most recent memories in an in-memory LRU cache, eliminating disk I/O for hot retrievals.
- **`auto_bubble` and `bubble_threshold`** – These control promotion of memories up the hierarchy from turn to session to agent scope. A threshold of `0.75` means only memories with importance above that value bubble to broader scopes.

## Wrap an LLM Client with `with_memory`

Pass the config and stable identifiers to `with_memory`. In [`headroom/memory/wrapper.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/wrapper.py), this function returns a `MemoryWrapper` that intercepts every `chat.completions.create` call.

```python
from headroom import with_memory
from openai import OpenAI
from headroom.memory.config import EmbedderBackend

SESSION_ID = "my-long-running-session"
AGENT_ID = "code-assistant"

client = OpenAI()
memory_client = with_memory(
    client,
    user_id="alice",
    db_path="my_agent_memory.db",
    top_k=8,
    session_id=SESSION_ID,
    agent_id=AGENT_ID,
    embedder_backend=EmbedderBackend.LOCAL,
    config=memory_cfg,
)

```

Under the hood, `with_memory` instantiates a `MemoryWrapper`. The wrapper lazily builds a `HierarchicalMemory` via `HierarchicalMemory.create(memory_cfg)` in [`headroom/memory/core.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/core.py). The factory `create_memory_system` in [`headroom/memory/factory.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/factory.py) wires the SQLite store, vector index, text index, embedder, and optional cache into a single orchestrator.

## How Memory Injection and Extraction Work

Every call to `memory_client.chat.completions.create` routes through the wrapper's `_WrappedCompletions.create` method. According to [`headroom/memory/wrapper.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/wrapper.py), the wrapper performs three synchronous steps:

1. **Injects context** – `MemoryWrapper._inject_memories` runs a vector search scoped to the current `session_id` and `agent_id`, then inserts the top-k relevant memories into the first user message.
2. **Extracts facts** – A hidden memory-extraction instruction is appended to the system prompt. The LLM response is parsed for new memories.
3. **Persists memories** – `MemoryWrapper._store_memories` calls `HierarchicalMemory.add` to store newly extracted facts with a default importance of `0.7`.

Because `session_id` and `agent_id` are attached to every read and write, memories automatically survive across arbitrarily many subsequent calls.

### Example Interaction Across Calls

```python

# First request – the fact is extracted and stored automatically.

resp1 = memory_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "My favorite editor is Vim."}]
)

# Second request – the stored fact is injected into context.

resp2 = memory_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What editor do I like?"}]
)

```

The injection runs synchronously before the LLM request, adding zero perceived latency to the completion itself.

## Tune the Hierarchical Memory System for Longevity

Long-running agents require predictable resource usage and clean separation of concerns. Tune these settings based on the adapter implementations in [`headroom/memory/adapters/sqlite_vector.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/adapters/sqlite_vector.py) and [`headroom/memory/adapters/hnsw.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/adapters/hnsw.py):

- **Persist across restarts** – Use `store_backend=StoreBackend.SQLITE` with a stable `db_path`. The SQLite store writes all memories, vector metadata, and FTS5 indexes to a single file that can be reopened by any subsequent process.
- **Bound vector index growth** – Choose `vector_backend=VectorBackend.SQLITE_VEC` with `vector_cache_size_kb` to keep the vector index on disk with a fixed page cache. The HNSW adapter offers high recall but resides in memory, so it grows unbounded unless `hnsw_max_entries` is configured.
- **Reduce lookup latency** – Enable `cache_enabled=True` and size `cache_max_size` to your working set, for example `2000`. Frequently accessed memories stay in RAM instead of hitting SQLite.
- **Promote only high-value facts** – Raise `bubble_threshold` to `0.9` if you want only critical facts to bubble from session scope to agent scope. Lower thresholds increase recall but risk polluting long-term memory with transient details.
- **Isolate multiple agents** – Assign a unique `agent_id` to each logical bot. The `HierarchicalMemory` scoping logic ensures memories written for one `agent_id` are never retrieved for another.
- **Lower embedding overhead** – Select `embedder_backend=EmbedderBackend.ONNX` instead of `LOCAL` to avoid loading heavy PyTorch dependencies. The ONNX runtime backend in [`headroom/memory/adapters/embedders.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/adapters/embedders.py) is optimized for constrained environments.
- **Plug in custom embedders** – Set `embedder_backend="external"` and register a provider via the `headroom.memory_vector` entry point to use proprietary embedding services.

## Direct Memory Operations

For debugging or non-LLM events, access the underlying `HierarchicalMemory` through the wrapper's `.memory` attribute.

```python
wrapper = memory_client  # the object returned by with_memory

# Manually add a memory

wrapper.memory.add(content="User prefers dark theme", importance=0.85)

# Search stored memories

matches = wrapper.memory.search(query="dark theme", top_k=5)
for mem in matches:
    print(mem.content, mem.importance)

# Clear all memories for the current user scope

deleted = wrapper.memory.clear()
print(f"Removed {deleted} memories")

# Retrieve statistics

stats = wrapper.memory.stats()
print(stats)  # e.g., {"total": 123}

```

## Summary

- **`MemoryConfig`** in [`headroom/memory/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/config.py) is the single source of truth for persistence, indexing, caching, and bubbling behavior in the hierarchical memory system.
- **`with_memory`** in [`headroom/memory/wrapper.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/wrapper.py) binds a config and stable `session_id` to an LLM client, creating a `MemoryWrapper` that manages memory injection and extraction automatically.
- **SQLite persistence** plus a bounded vector index such as SQLite-Vec ensures that long-running agents survive restarts without unbounded memory growth.
- **Caching** (`cache_enabled`, `cache_max_size`) and **bubbling thresholds** (`auto_bubble`, `bubble_threshold`) let you optimize retrieval speed and long-term memory quality.

## Frequently Asked Questions

### What file defines the memory configuration options in Headroom?

The `MemoryConfig` dataclass is defined in [`headroom/memory/config.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/config.py). It declares every configurable field, including `store_backend`, `vector_backend`, `embedder_backend`, `cache_enabled`, and `bubble_threshold`, as strongly typed enum or primitive values.

### How does Headroom keep memories alive across process restarts?

Headroom persists memories through the SQLite store implemented in [`headroom/memory/adapters/sqlite.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/adapters/sqlite.py). When `store_backend` is set to `StoreBackend.SQLITE` and `db_path` points to a stable file, all hierarchical memories, vector embeddings, and full-text indexes are written to disk and reloaded on the next initialization.

### What is memory bubbling and how do I control it?

Memory bubbling is the automatic promotion of important memories from narrow scopes, such as a single turn or session, to broader scopes like agent or user. It is governed by `auto_bubble` and `bubble_threshold` in `MemoryConfig`. When `auto_bubble` is `True` and a memory's importance exceeds `bubble_threshold`, the `HierarchicalMemory` orchestrator in [`headroom/memory/core.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/core.py) promotes that memory so it survives beyond the current session.

### Which vector backend should I choose for a long-running agent?

For agents that must run indefinitely with bounded resource usage, select `VectorBackend.SQLITE_VEC`. The SQLite-Vec adapter in [`headroom/memory/adapters/sqlite_vector.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/adapters/sqlite_vector.py) maintains an on-disk index with a configurable page cache via `vector_cache_size_kb`. HNSW from [`headroom/memory/adapters/hnsw.py`](https://github.com/chopratejas/headroom/blob/main/headroom/memory/adapters/hnsw.py) offers high recall but resides in memory, making it better suited for short-lived or high-throughput scenarios where memory limits are explicitly managed.