How to Configure the Hierarchical Memory System for Long-Running Agent Sessions in Headroom
To configure the hierarchical memory system in Headroom for long-running agent sessions, create a MemoryConfig dataclass with persistent storage, a vector index, and an embedder backend, then pass it to the with_memory wrapper along with a stable session_id and optional agent_id.
Headroom implements its memory layer as a hierarchical, plug-in-based system defined in headroom/memory/config.py and orchestrated through headroom/memory/core.py. By tuning the MemoryConfig dataclass and scoping every read and write to a stable session_id, you can build long-running agent sessions that retain context across arbitrarily many interactions and process restarts.
Define the Memory Hierarchy in MemoryConfig
The MemoryConfig dataclass in headroom/memory/config.py controls persistence, vector indexing, full-text search, embedding, caching, and bubbling. The following configuration is tuned for a long-running agent that must survive restarts and bound its memory usage.
from pathlib import Path
from headroom.memory.config import MemoryConfig, StoreBackend, VectorBackend, TextBackend, EmbedderBackend
memory_cfg = MemoryConfig(
store_backend=StoreBackend.SQLITE,
db_path=Path("./my_agent_memory.db"),
vector_backend=VectorBackend.AUTO,
vector_dimension=384,
vector_cache_size_kb=8192,
text_backend=TextBackend.FTS5,
embedder_backend=EmbedderBackend.LOCAL,
embedder_model="sentence-transformers/all-MiniLM-L6-v2",
cache_enabled=True,
cache_max_size=2000,
auto_bubble=True,
bubble_threshold=0.75,
)
Key fields for long-running sessions include:
store_backend– Set toStoreBackend.SQLITEto write memories to disk atdb_path. This ensures state survives process restarts.vector_backend–VectorBackend.AUTOselects SQLite-Vec if the optional dependency is installed, otherwise falls back to HNSW. SQLite-Vec maintains a bounded on-disk index, while HNSW grows unbounded unlesshnsw_max_entriesis set.cache_enabledandcache_max_size– WhenTrue, Headroom keeps the most recent memories in an in-memory LRU cache, eliminating disk I/O for hot retrievals.auto_bubbleandbubble_threshold– These control promotion of memories up the hierarchy from turn to session to agent scope. A threshold of0.75means only memories with importance above that value bubble to broader scopes.
Wrap an LLM Client with with_memory
Pass the config and stable identifiers to with_memory. In headroom/memory/wrapper.py, this function returns a MemoryWrapper that intercepts every chat.completions.create call.
from headroom import with_memory
from openai import OpenAI
from headroom.memory.config import EmbedderBackend
SESSION_ID = "my-long-running-session"
AGENT_ID = "code-assistant"
client = OpenAI()
memory_client = with_memory(
client,
user_id="alice",
db_path="my_agent_memory.db",
top_k=8,
session_id=SESSION_ID,
agent_id=AGENT_ID,
embedder_backend=EmbedderBackend.LOCAL,
config=memory_cfg,
)
Under the hood, with_memory instantiates a MemoryWrapper. The wrapper lazily builds a HierarchicalMemory via HierarchicalMemory.create(memory_cfg) in headroom/memory/core.py. The factory create_memory_system in headroom/memory/factory.py wires the SQLite store, vector index, text index, embedder, and optional cache into a single orchestrator.
How Memory Injection and Extraction Work
Every call to memory_client.chat.completions.create routes through the wrapper's _WrappedCompletions.create method. According to headroom/memory/wrapper.py, the wrapper performs three synchronous steps:
- Injects context –
MemoryWrapper._inject_memoriesruns a vector search scoped to the currentsession_idandagent_id, then inserts the top-k relevant memories into the first user message. - Extracts facts – A hidden memory-extraction instruction is appended to the system prompt. The LLM response is parsed for new memories.
- Persists memories –
MemoryWrapper._store_memoriescallsHierarchicalMemory.addto store newly extracted facts with a default importance of0.7.
Because session_id and agent_id are attached to every read and write, memories automatically survive across arbitrarily many subsequent calls.
Example Interaction Across Calls
# First request – the fact is extracted and stored automatically.
resp1 = memory_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "My favorite editor is Vim."}]
)
# Second request – the stored fact is injected into context.
resp2 = memory_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What editor do I like?"}]
)
The injection runs synchronously before the LLM request, adding zero perceived latency to the completion itself.
Tune the Hierarchical Memory System for Longevity
Long-running agents require predictable resource usage and clean separation of concerns. Tune these settings based on the adapter implementations in headroom/memory/adapters/sqlite_vector.py and headroom/memory/adapters/hnsw.py:
- Persist across restarts – Use
store_backend=StoreBackend.SQLITEwith a stabledb_path. The SQLite store writes all memories, vector metadata, and FTS5 indexes to a single file that can be reopened by any subsequent process. - Bound vector index growth – Choose
vector_backend=VectorBackend.SQLITE_VECwithvector_cache_size_kbto keep the vector index on disk with a fixed page cache. The HNSW adapter offers high recall but resides in memory, so it grows unbounded unlesshnsw_max_entriesis configured. - Reduce lookup latency – Enable
cache_enabled=Trueand sizecache_max_sizeto your working set, for example2000. Frequently accessed memories stay in RAM instead of hitting SQLite. - Promote only high-value facts – Raise
bubble_thresholdto0.9if you want only critical facts to bubble from session scope to agent scope. Lower thresholds increase recall but risk polluting long-term memory with transient details. - Isolate multiple agents – Assign a unique
agent_idto each logical bot. TheHierarchicalMemoryscoping logic ensures memories written for oneagent_idare never retrieved for another. - Lower embedding overhead – Select
embedder_backend=EmbedderBackend.ONNXinstead ofLOCALto avoid loading heavy PyTorch dependencies. The ONNX runtime backend inheadroom/memory/adapters/embedders.pyis optimized for constrained environments. - Plug in custom embedders – Set
embedder_backend="external"and register a provider via theheadroom.memory_vectorentry point to use proprietary embedding services.
Direct Memory Operations
For debugging or non-LLM events, access the underlying HierarchicalMemory through the wrapper's .memory attribute.
wrapper = memory_client # the object returned by with_memory
# Manually add a memory
wrapper.memory.add(content="User prefers dark theme", importance=0.85)
# Search stored memories
matches = wrapper.memory.search(query="dark theme", top_k=5)
for mem in matches:
print(mem.content, mem.importance)
# Clear all memories for the current user scope
deleted = wrapper.memory.clear()
print(f"Removed {deleted} memories")
# Retrieve statistics
stats = wrapper.memory.stats()
print(stats) # e.g., {"total": 123}
Summary
MemoryConfiginheadroom/memory/config.pyis the single source of truth for persistence, indexing, caching, and bubbling behavior in the hierarchical memory system.with_memoryinheadroom/memory/wrapper.pybinds a config and stablesession_idto an LLM client, creating aMemoryWrapperthat manages memory injection and extraction automatically.- SQLite persistence plus a bounded vector index such as SQLite-Vec ensures that long-running agents survive restarts without unbounded memory growth.
- Caching (
cache_enabled,cache_max_size) and bubbling thresholds (auto_bubble,bubble_threshold) let you optimize retrieval speed and long-term memory quality.
Frequently Asked Questions
What file defines the memory configuration options in Headroom?
The MemoryConfig dataclass is defined in headroom/memory/config.py. It declares every configurable field, including store_backend, vector_backend, embedder_backend, cache_enabled, and bubble_threshold, as strongly typed enum or primitive values.
How does Headroom keep memories alive across process restarts?
Headroom persists memories through the SQLite store implemented in headroom/memory/adapters/sqlite.py. When store_backend is set to StoreBackend.SQLITE and db_path points to a stable file, all hierarchical memories, vector embeddings, and full-text indexes are written to disk and reloaded on the next initialization.
What is memory bubbling and how do I control it?
Memory bubbling is the automatic promotion of important memories from narrow scopes, such as a single turn or session, to broader scopes like agent or user. It is governed by auto_bubble and bubble_threshold in MemoryConfig. When auto_bubble is True and a memory's importance exceeds bubble_threshold, the HierarchicalMemory orchestrator in headroom/memory/core.py promotes that memory so it survives beyond the current session.
Which vector backend should I choose for a long-running agent?
For agents that must run indefinitely with bounded resource usage, select VectorBackend.SQLITE_VEC. The SQLite-Vec adapter in headroom/memory/adapters/sqlite_vector.py maintains an on-disk index with a configurable page cache via vector_cache_size_kb. HNSW from headroom/memory/adapters/hnsw.py offers high recall but resides in memory, making it better suited for short-lived or high-throughput scenarios where memory limits are explicitly managed.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →