architecture

Architecture of Memory Consolidation with Decay and Compression in MCP Memory Service

February 28, 2026 doobidoo/mcp-memory-service ↗

The MCP Memory Service implements a biologically-inspired consolidation pipeline that uses exponential decay scoring to calculate memory relevance and semantic compression to distill clusters into compact knowledge representations, orchestrated by the DreamInspiredConsolidator.

The architecture of memory consolidation with decay and compression in MCP Memory Service follows a "dream-inspired" design that mimics biological memory processing. This open-source system, available in the doobidoo/mcp-memory-service repository, transforms raw memory streams into a structured, high-value knowledge graph through staged relevance evaluation and intelligent summarization.

The Dream-Inspired Consolidation Architecture

At the heart of the system lies the DreamInspiredConsolidator class defined in src/mcp_memory_service/consolidation/consolidator.py. This coordinator manages a multi-phase pipeline that processes memories through distinct biological stages: relevance decay calculation, semantic clustering, creative association discovery, semantic compression, and controlled forgetting. The architecture separates concerns between scoring (determining what matters) and compression (determining how to store it efficiently), allowing each component to evolve independently while maintaining strict protocols for memory transformation.

Stage 1: Exponential Decay Scoring

The first stage evaluates memory relevance using biologically-inspired exponential decay algorithms. The ExponentialDecayCalculator class in src/mcp_memory_service/consolidation/decay.py (lines 40-44) computes a dynamic relevance score that declines over time unless reinforced by access patterns, connections, or quality metrics.

Implementation Details

The decay calculator processes each Memory object through a multi-factor scoring function. It generates a RelevanceScore object that stores both the final score and intermediate calculation factors (metadata lines 56-66 in decay.py), enabling transparent debugging and audit trails.

Relevance Calculation Factors

The scoring algorithm weighs seven distinct biological signals:

Age – Computed as days since creation via _get_memory_age_days, with older memories receiving steeper decay curves.
Base importance – Derived from explicit importance_score metadata or tag-based heuristics in _get_base_importance.
Retention period – Type-specific defaults stored in self.retention_periods that define half-life constants.
Connection boost – A 10% relevance increase (connection_boost) per linked memory in the knowledge graph.
Access boost – Temporal reinforcement where memories accessed within 1 day receive 1.5× multiplier, 1.2× within a week, etc., calculated by _calculate_access_boost.
Quality multiplier – High-fidelity memories decay slower according to quality_multiplier = 1.0 + quality_score × 0.5.


# Example: Directly use the decay calculator

from mcp_memory_service.consolidation.decay import ExponentialDecayCalculator
from mcp_memory_service.models.memory import Memory

# Minimal config with custom retention periods

config = type("Cfg", (), {"retention_periods": {"default": 30, "critical": 180}})
decay = ExponentialDecayCalculator(config)

memories = [
    Memory(content="Important design decision", tags=["critical"], quality_score=0.9,
           created_at=1700000000, updated_at=1700003600, metadata={}),
    Memory(content="Random note", tags=["note"], quality_score=0.3,
           created_at=1690000000, updated_at=1690000100, metadata={})
]

scores = await decay.process(memories, connections={}, access_patterns={})
for s in scores:
    print(f"{s.memory_hash[:8]} → total:{s.total_score:.3f} decay:{s.decay_factor:.2f}")

Stage 2: Semantic Compression

Once relevance scores identify high-value memory clusters, the system applies semantic compression to reduce storage overhead while preserving conceptual fidelity. The SemanticCompressionEngine in src/mcp_memory_service/consolidation/compression.py implements a lossy but intelligent summarization process that achieves 60-80% storage reduction.

The Compression Workflow

The compression engine processes MemoryCluster objects through a five-step pipeline (lines 63-71 in compression.py):

Key-concept extraction (_extract_key_concepts) – Identifies technical terms, acronyms, frequent non-stop words, and custom theme keywords that define the cluster's semantic core.
Thematic summary generation (_generate_thematic_summary) – Constructs a concise narrative respecting max_summary_length constraints while preserving domain terminology.
Temporal span calculation (_calculate_temporal_span) – Generates human-readable time windows indicating the cluster's temporal coverage.
Tag and metadata aggregation (_aggregate_tags / _aggregate_metadata) – Preserves provenance information and cluster characteristics in the compressed representation.
Compression ratio calculation – Computed as len(summary) / sum(len(m.content) for m in memories) (lines 15-17) to verify storage efficiency.

Compression Results

The engine returns a CompressionResult object containing the compressed Memory instance, the calculated compression ratio, extracted key concepts, and auxiliary metadata (lines 45-57 in compression.py). This structured output enables downstream systems to track information loss and retrieval accuracy.


# Example: Compress a small cluster manually

from mcp_memory_service.consolidation.compression import SemanticCompressionEngine
from mcp_memory_service.models.memory import Memory
from mcp_memory_service.consolidation.base import MemoryCluster

config = type("Cfg", (), {"max_summary_length": 500, "preserve_originals": False})
compressor = SemanticCompressionEngine(config)

# Build a dummy cluster (normally produced by the clustering engine)

memories = [
    Memory(content="Design decision: use PostgreSQL for primary storage", tags=["database", "design"]),
    Memory(content="Design decision: implement connection pooling", tags=["database", "design"])
]
cluster = MemoryCluster(
    cluster_id="c1",
    memory_hashes=[m.content_hash for m in memories],
    theme_keywords=["design", "decision"],
    centroid_embedding=[0.1, 0.2, 0.3],
    coherence_score=0.85
)

result = await compressor.process([cluster], memories)
print("Compressed ratio:", result[0].compression_ratio)
print("Summary:", result[0].compressed_memory.content)

Orchestrating the Full Pipeline

The DreamInspiredConsolidator class in src/mcp_memory_service/consolidation/consolidator.py serves as the high-level coordinator that sequences the decay and compression stages. It implements a stateful pipeline that progresses through five distinct phases based on configuration flags and temporal horizons.

Consolidation Phases

The pipeline proceeds through the following phases, each controlled by specific configuration toggles:

Phase	Configuration Flag	Core Component	Main Purpose
Relevance Scoring	`decay_enabled`	`ExponentialDecayCalculator`	Compute decay-adjusted scores for all memories
Semantic Clustering	`clustering_enabled`	`SemanticClusteringEngine`	Group similar memories by embedding proximity
Creative Associations	`associations_enabled`	`CreativeAssociationEngine`	Discover novel links between distant memories
Compression	`compression_enabled`	`SemanticCompressionEngine`	Summarize clusters into compact representations
Controlled Forgetting	`forgetting_enabled`	`ControlledForgettingEngine`	Archive or delete low-relevance items

The consolidator selects which phases to execute based on the time horizon (daily, weekly, monthly) and the ENABLED_PHASES mapping defined in lines 34-40 of consolidator.py. After each phase, the system updates statistics, performs batch writes for performance, and records health metrics.


# Example: Run a full weekly consolidation (decay + compression)

from mcp_memory_service.consolidation.consolidator import DreamInspiredConsolidator
from mcp_memory_service.storage.hybrid import HybridStorage
from mcp_memory_service.config import load_config

config = load_config()                     # returns a ConsolidationConfig instance

storage = await HybridStorage.create()      # async init of the storage backend

consolidator = DreamInspiredConsolidator(storage, config)

report = await consolidator.consolidate("weekly")
print(report)   # shows memories processed, clusters created, compression ratio, etc.

Summary

The architecture of memory consolidation with decay and compression in MCP Memory Service combines biological inspiration with efficient algorithmic implementation:

Exponential Decay Scoring evaluates memory relevance through age-based decay modulated by access patterns, connection density, and quality metrics, implemented in src/mcp_memory_service/consolidation/decay.py.
Semantic Compression reduces storage overhead by 60-80% through intelligent clustering and summarization, implemented in src/mcp_memory_service/consolidation/compression.py.
Pipeline Orchestration sequences these stages through the DreamInspiredConsolidator, enabling configurable consolidation horizons from daily to monthly cycles.
Configurable Architecture allows selective enablement of decay, clustering, compression, and forgetting phases based on operational requirements.

Frequently Asked Questions

How does exponential decay scoring work in MCP Memory Service?

Exponential decay scoring calculates a dynamic relevance score for each memory using the ExponentialDecayCalculator class in src/mcp_memory_service/consolidation/decay.py. The system computes memory age in days and applies a decay curve modulated by seven factors: base importance, retention periods, connection boosts (10% per link), access recency multipliers (1.5× for daily access), and quality multipliers (1.0 + quality_score × 0.5). The result is a RelevanceScore object containing both the final score and intermediate calculation metadata for transparency.

What is semantic compression and how does it reduce storage?

Semantic compression is the process of collapsing semantically similar memory clusters into single, condensed representations using the SemanticCompressionEngine in src/mcp_memory_service/consolidation/compression.py. The engine extracts key concepts, generates thematic summaries respecting max_summary_length constraints, calculates temporal spans, and aggregates metadata from source memories. This process typically achieves 60-80% storage reduction while preserving retrieval accuracy, returning a CompressionResult containing the compressed memory object and calculated compression ratio.

How does the DreamInspiredConsolidator orchestrate the pipeline?

The DreamInspiredConsolidator in src/mcp_memory_service/consolidation/consolidator.py serves as the high-level workflow coordinator that sequences the consolidation pipeline through five configurable phases: relevance scoring, semantic clustering, creative associations, compression, and controlled forgetting. It selects active phases based on the ENABLED_PHASES mapping and time horizon parameters (daily, weekly, monthly), executing each enabled stage with batch processing for performance. After each phase, the consolidator updates statistics, persists results to storage, and records health metrics to ensure system reliability.

What configuration options control decay and compression behavior?

The consolidation behavior is governed by ConsolidationConfig in src/mcp_memory_service/config.py, which provides boolean toggles for each pipeline phase including decay_enabled, clustering_enabled, compression_enabled, and forgetting_enabled. For decay calculation, developers can define custom retention_periods (e.g., 30 days for default, 180 days for critical memories) and adjust boost multipliers. For compression, the max_summary_length parameter controls summary verbosity, while preserve_originals determines whether source memories remain accessible after compression. These settings enable fine-tuned control over the trade-off between storage efficiency and memory fidelity.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how doobidoo/mcp-memory-service works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →