How Does the Quality Scoring System Evaluate Memory Relevance in MCP Memory Service?
The quality scoring system in MCP Memory Service evaluates memory relevance through a composite 0-1 score that blends heuristic content analysis, AI-generated quality signals, time decay, tag matching, and project-specific affinity filters.
The MCP Memory Service (available at doobidoo/mcp-memory-service) ranks stored memories—such as architectural decisions, bug fixes, and project notes—to determine which entries deserve injection into the current conversation context. The quality scoring system drives this ranking through a multi-factor pipeline implemented in claude-hooks/utilities/memory-scorer.js, combining local heuristics with backend AI analysis to surface high-value information while suppressing generic or outdated content.
Overview of the Composite Scoring Pipeline
The relevance engine processes each memory through eight distinct evaluation stages before producing a final normalized score. According to the source code, the pipeline executes these steps in sequence:
- Time decay – Applies exponential aging penalties to older memories
- Tag relevance – Scores overlap between memory tags and project metadata
- Content relevance – Counts keyword matches between memory text and project descriptions
- Content quality – Heuristically penalizes short, generic, or low-information content
- Backend quality – Incorporates AI-generated quality scores from ONNX/Groq models
- Type bonus – Adds weight for high-value memory types like
decisionorarchitecture - Recency bonus – Grants additional boosts to memories from the last 7, 14, or 30 days
- Conversation relevance – Optionally aligns memories with current chat topics and intents
A final project-affinity filter acts as a hard gate, zeroing out scores for memories lacking project-specific references.
Core Scoring Components
Time Decay
The calculateTimeDecay() function (lines 11-42 in memory-scorer.js) applies exponential decay based on memory age. Memories created within the last 0-7 days maintain scores near 1.0, while older entries asymptotically approach 0. This ensures recent project knowledge receives priority regardless of other quality signals.
Tag Relevance
Implemented in calculateTagRelevance() (lines 53-95), this component rewards memories whose tags intersect with the project's name, primary language, frameworks, or tools. Exact matches on critical identifiers like the project name or language trigger additional bonus multipliers, ensuring domain-specific memories surface before tangential notes.
Content Relevance
The calculateContentRelevance() function (lines 66-103) performs keyword frequency analysis between memory content and project-specific terminology. It searches for the project name, language, frameworks, tools, and generic technical terms, applying logarithmic weighting to favor memories with repeated, meaningful keyword matches over single occurrences.
Content Quality Heuristic
Perhaps the most influential local signal, calculateContentQuality() (lines 106-155) evaluates "meaningfulness" through multiple lenses:
- Pattern detection – Identifies and penalizes generic session-summary formats
- Length validation – Applies score reductions for very short content
- Action-oriented vocabulary – Rewards presence of implementation keywords like "implemented," "fixed," or "refactored"
- Lexical diversity – Measures vocabulary richness to identify substantive technical writing
This heuristic returns a 0-1 score that heavily influences the final ranking.
Backend Quality Integration
The calculateBackendQuality() function (lines 84-100) integrates AI-generated quality signals from the MCP backend. When available, this reads the metadata.quality_score field generated by ONNX models or Groq API analysis. Missing scores default to a neutral 0.5, allowing the system to function while encouraging backend enrichment for critical memories.
Type and Recency Bonuses
Strategic memory types receive fixed offsets via calculateTypeBonus() (lines 115-143). Architectural decisions and project decisions receive +0.3 boosts, while calculateRecencyBonus() (lines 136-172) adds incremental weight for memories created within sliding windows of 7, 14, or 30 days.
Conversation Relevance (Optional Phase 2)
When enabled, calculateConversationRelevance() (lines 190-258) performs dynamic analysis against the current chat session. It matches memory content against detected topics, entities, intent classifications, and code context from the active conversation, allowing real-time relevance adjustment based on immediate user needs.
Project-Affinity Filtering
Before finalization, the system executes a hard filter within calculateRelevanceScore() (lines 66-89). Memories lacking the current project name in either tags or content, combined with low tag relevance scores, receive either forced zero scores or heavy penalties. This prevents cross-project contamination in multi-tenant environments.
How Quality Influences the Final Score
The aggregation logic in calculateRelevanceScore() (lines 94-147) applies configurable weights to combine all components into a 0-1 normalized result. The content quality weight defaults to 0.15-0.20, while backend quality contributes an additional 0.20.
Critical quality gates include:
- Severe quality penalty: If
calculateContentQuality()returns less than 0.2, the entire final score is halved (finalScore *= 0.5) - Backend override: High backend quality scores (0.8+) can compensate for mediocre heuristic scores when the AI model identifies rich semantic content invisible to pattern matching
- Zero-sum filtering: Project-affinity violations force scores to exactly 0 regardless of other positive signals
Practical Implementation
Below is a runnable example demonstrating how to score memories using the exported utilities from claude-hooks/utilities/memory-scorer.js:
// example-usage.js
const {
scoreMemoryRelevance,
calculateRelevanceScore
} = require('./claude-hooks/utilities/memory-scorer');
// Define the active project context
const projectContext = {
name: 'mcp-memory-service',
language: 'JavaScript',
frameworks: ['Node.js'],
tools: ['npm', 'docker']
};
// Sample memories retrieved from the MCP API
const memories = [
{
content: 'Decided to switch to SQLite-vec for embedding storage.',
tags: ['mcp-memory-service', 'decision', 'sqlite-vec'],
memory_type: 'decision',
created_at: 1730145600, // Unix timestamp (seconds)
metadata: { quality_score: 0.86 }
},
{
content: 'Fixed typo in the README that prevented Docker builds.',
tags: ['bug-fix', 'documentation'],
memory_type: 'bug-fix',
created_at: 1729905600,
metadata: {}
},
{
content: 'Random note about a personal project unrelated to this repo.',
tags: ['personal', 'note'],
memory_type: 'note',
created_at: 1727305600,
metadata: {}
}
];
// Score and rank memories
const ranked = scoreMemoryRelevance(memories, projectContext, {
verbose: true, // Enables debugging output
includeConversationContext: false // Disables chat-topic analysis
});
console.log('\nTop memories:');
ranked.slice(0, 2).forEach((mem, i) => {
console.log(`${i + 1}. Score: ${mem.relevanceScore.toFixed(3)}`);
console.log(` Content: ${mem.content}`);
console.log(' Breakdown:', mem.scoreBreakdown);
});
This implementation leverages the actual scoring pipeline from doobidoo/mcp-memory-service, producing ranked results where the decision memory surfaces first due to its high backend quality score (0.86), matching project tags, and recent timestamp, while the unrelated personal note receives a negligible score.
Summary
- The quality scoring system combines eight distinct signals ranging from temporal decay to AI-generated quality metrics
- Content quality heuristics in
calculateContentQuality()(lines 106-155) penalize generic content while rewarding technical specificity and action-oriented language - Backend quality integration allows ONNX/Groq models to influence rankings through the
metadata.quality_scorefield - Project-affinity filtering prevents off-topic memories from contaminating context windows
- All scoring weights are configurable, with severe penalties triggered when content quality falls below 0.2
- The complete implementation resides in
claude-hooks/utilities/memory-scorer.js
Frequently Asked Questions
What is the valid range for memory relevance scores?
The quality scoring system normalizes all final scores to a 0-1 range, where 1.0 represents maximum relevance and 0.0 indicates complete irrelevance or project mismatch. Intermediate values reflect weighted combinations of time decay, content quality, tag matching, and backend AI signals.
How does the content quality heuristic identify low-quality memories?
The calculateContentQuality() function detects low-quality entries by checking for generic session-summary patterns, measuring content length (penalizing very short memories), and analyzing lexical diversity. Memories scoring below 0.2 trigger an automatic 50% reduction in final relevance score, effectively burying vague or templated content regardless of other positive signals.
What happens if a memory lacks a backend quality score?
When the metadata.quality_score field is absent, calculateBackendQuality() defaults to 0.5, a neutral value that neither boosts nor penalizes the memory. This default ensures backward compatibility while allowing systems to gradually populate quality metadata through ONNX or Groq backend processing without invalidating existing memories.
Can developers customize the weighting of different scoring components?
Yes. The defaultWeights object in memory-scorer.js exposes configurable multipliers for every component including content quality (default 0.15-0.20), backend quality (default 0.20), tag relevance, and time decay. Developers can pass custom weight objects to calculateRelevanceScore() to prioritize specific signals—for example, increasing the weight of conversation relevance in chat-heavy applications or emphasizing backend quality in AI-enriched environments.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →