# How Does the Quality Scoring System Evaluate Memory Relevance in MCP Memory Service?

> Discover how MCP Memory Service quality scoring evaluates memory relevance using a composite score combining heuristic analysis, AI signals, time decay, tag matching, and project filters. Learn more.

- Repository: [Henry/mcp-memory-service](https://github.com/doobidoo/mcp-memory-service)
- Tags: deep-dive
- Published: 2026-02-28

---

**The quality scoring system in MCP Memory Service evaluates memory relevance through a composite 0-1 score that blends heuristic content analysis, AI-generated quality signals, time decay, tag matching, and project-specific affinity filters.**

The **MCP Memory Service** (available at `doobidoo/mcp-memory-service`) ranks stored memories—such as architectural decisions, bug fixes, and project notes—to determine which entries deserve injection into the current conversation context. The **quality scoring system** drives this ranking through a multi-factor pipeline implemented in [`claude-hooks/utilities/memory-scorer.js`](https://github.com/doobidoo/mcp-memory-service/blob/main/claude-hooks/utilities/memory-scorer.js), combining local heuristics with backend AI analysis to surface high-value information while suppressing generic or outdated content.

## Overview of the Composite Scoring Pipeline

The relevance engine processes each memory through eight distinct evaluation stages before producing a final normalized score. According to the source code, the pipeline executes these steps in sequence:

1. **Time decay** – Applies exponential aging penalties to older memories
2. **Tag relevance** – Scores overlap between memory tags and project metadata
3. **Content relevance** – Counts keyword matches between memory text and project descriptions
4. **Content quality** – Heuristically penalizes short, generic, or low-information content
5. **Backend quality** – Incorporates AI-generated quality scores from ONNX/Groq models
6. **Type bonus** – Adds weight for high-value memory types like `decision` or `architecture`
7. **Recency bonus** – Grants additional boosts to memories from the last 7, 14, or 30 days
8. **Conversation relevance** – Optionally aligns memories with current chat topics and intents

A final **project-affinity filter** acts as a hard gate, zeroing out scores for memories lacking project-specific references.

## Core Scoring Components

### Time Decay

The `calculateTimeDecay()` function (lines 11-42 in [`memory-scorer.js`](https://github.com/doobidoo/mcp-memory-service/blob/main/memory-scorer.js)) applies exponential decay based on memory age. Memories created within the last 0-7 days maintain scores near 1.0, while older entries asymptotically approach 0. This ensures recent project knowledge receives priority regardless of other quality signals.

### Tag Relevance

Implemented in `calculateTagRelevance()` (lines 53-95), this component rewards memories whose tags intersect with the project's name, primary language, frameworks, or tools. Exact matches on critical identifiers like the project name or language trigger additional bonus multipliers, ensuring domain-specific memories surface before tangential notes.

### Content Relevance

The `calculateContentRelevance()` function (lines 66-103) performs keyword frequency analysis between memory content and project-specific terminology. It searches for the project name, language, frameworks, tools, and generic technical terms, applying logarithmic weighting to favor memories with repeated, meaningful keyword matches over single occurrences.

### Content Quality Heuristic

Perhaps the most influential local signal, `calculateContentQuality()` (lines 106-155) evaluates "meaningfulness" through multiple lenses:

- **Pattern detection** – Identifies and penalizes generic session-summary formats
- **Length validation** – Applies score reductions for very short content
- **Action-oriented vocabulary** – Rewards presence of implementation keywords like "implemented," "fixed," or "refactored"
- **Lexical diversity** – Measures vocabulary richness to identify substantive technical writing

This heuristic returns a 0-1 score that heavily influences the final ranking.

### Backend Quality Integration

The `calculateBackendQuality()` function (lines 84-100) integrates AI-generated quality signals from the MCP backend. When available, this reads the `metadata.quality_score` field generated by ONNX models or Groq API analysis. Missing scores default to a neutral 0.5, allowing the system to function while encouraging backend enrichment for critical memories.

### Type and Recency Bonuses

Strategic memory types receive fixed offsets via `calculateTypeBonus()` (lines 115-143). Architectural decisions and project decisions receive +0.3 boosts, while `calculateRecencyBonus()` (lines 136-172) adds incremental weight for memories created within sliding windows of 7, 14, or 30 days.

### Conversation Relevance (Optional Phase 2)

When enabled, `calculateConversationRelevance()` (lines 190-258) performs dynamic analysis against the current chat session. It matches memory content against detected topics, entities, intent classifications, and code context from the active conversation, allowing real-time relevance adjustment based on immediate user needs.

### Project-Affinity Filtering

Before finalization, the system executes a hard filter within `calculateRelevanceScore()` (lines 66-89). Memories lacking the current project name in either tags or content, combined with low tag relevance scores, receive either forced zero scores or heavy penalties. This prevents cross-project contamination in multi-tenant environments.

## How Quality Influences the Final Score

The aggregation logic in `calculateRelevanceScore()` (lines 94-147) applies configurable weights to combine all components into a 0-1 normalized result. The **content quality** weight defaults to 0.15-0.20, while **backend quality** contributes an additional 0.20.

Critical quality gates include:

- **Severe quality penalty**: If `calculateContentQuality()` returns less than 0.2, the entire final score is halved (`finalScore *= 0.5`)
- **Backend override**: High backend quality scores (0.8+) can compensate for mediocre heuristic scores when the AI model identifies rich semantic content invisible to pattern matching
- **Zero-sum filtering**: Project-affinity violations force scores to exactly 0 regardless of other positive signals

## Practical Implementation

Below is a runnable example demonstrating how to score memories using the exported utilities from [`claude-hooks/utilities/memory-scorer.js`](https://github.com/doobidoo/mcp-memory-service/blob/main/claude-hooks/utilities/memory-scorer.js):

```javascript
// example-usage.js
const {
  scoreMemoryRelevance,
  calculateRelevanceScore
} = require('./claude-hooks/utilities/memory-scorer');

// Define the active project context
const projectContext = {
  name: 'mcp-memory-service',
  language: 'JavaScript',
  frameworks: ['Node.js'],
  tools: ['npm', 'docker']
};

// Sample memories retrieved from the MCP API
const memories = [
  {
    content: 'Decided to switch to SQLite-vec for embedding storage.',
    tags: ['mcp-memory-service', 'decision', 'sqlite-vec'],
    memory_type: 'decision',
    created_at: 1730145600, // Unix timestamp (seconds)
    metadata: { quality_score: 0.86 }
  },
  {
    content: 'Fixed typo in the README that prevented Docker builds.',
    tags: ['bug-fix', 'documentation'],
    memory_type: 'bug-fix',
    created_at: 1729905600,
    metadata: {}
  },
  {
    content: 'Random note about a personal project unrelated to this repo.',
    tags: ['personal', 'note'],
    memory_type: 'note',
    created_at: 1727305600,
    metadata: {}
  }
];

// Score and rank memories
const ranked = scoreMemoryRelevance(memories, projectContext, {
  verbose: true,                    // Enables debugging output
  includeConversationContext: false // Disables chat-topic analysis
});

console.log('\nTop memories:');
ranked.slice(0, 2).forEach((mem, i) => {
  console.log(`${i + 1}. Score: ${mem.relevanceScore.toFixed(3)}`);
  console.log(`   Content: ${mem.content}`);
  console.log('   Breakdown:', mem.scoreBreakdown);
});

```

This implementation leverages the actual scoring pipeline from `doobidoo/mcp-memory-service`, producing ranked results where the decision memory surfaces first due to its high backend quality score (0.86), matching project tags, and recent timestamp, while the unrelated personal note receives a negligible score.

## Summary

- The **quality scoring system** combines eight distinct signals ranging from temporal decay to AI-generated quality metrics
- **Content quality heuristics** in `calculateContentQuality()` (lines 106-155) penalize generic content while rewarding technical specificity and action-oriented language
- **Backend quality integration** allows ONNX/Groq models to influence rankings through the `metadata.quality_score` field
- **Project-affinity filtering** prevents off-topic memories from contaminating context windows
- All scoring weights are configurable, with severe penalties triggered when content quality falls below 0.2
- The complete implementation resides in [`claude-hooks/utilities/memory-scorer.js`](https://github.com/doobidoo/mcp-memory-service/blob/main/claude-hooks/utilities/memory-scorer.js)

## Frequently Asked Questions

### What is the valid range for memory relevance scores?

The **quality scoring system** normalizes all final scores to a **0-1 range**, where 1.0 represents maximum relevance and 0.0 indicates complete irrelevance or project mismatch. Intermediate values reflect weighted combinations of time decay, content quality, tag matching, and backend AI signals.

### How does the content quality heuristic identify low-quality memories?

The `calculateContentQuality()` function detects low-quality entries by checking for generic session-summary patterns, measuring content length (penalizing very short memories), and analyzing lexical diversity. Memories scoring below 0.2 trigger an automatic 50% reduction in final relevance score, effectively burying vague or templated content regardless of other positive signals.

### What happens if a memory lacks a backend quality score?

When the `metadata.quality_score` field is absent, `calculateBackendQuality()` defaults to **0.5**, a neutral value that neither boosts nor penalizes the memory. This default ensures backward compatibility while allowing systems to gradually populate quality metadata through ONNX or Groq backend processing without invalidating existing memories.

### Can developers customize the weighting of different scoring components?

Yes. The `defaultWeights` object in [`memory-scorer.js`](https://github.com/doobidoo/mcp-memory-service/blob/main/memory-scorer.js) exposes configurable multipliers for every component including content quality (default 0.15-0.20), backend quality (default 0.20), tag relevance, and time decay. Developers can pass custom weight objects to `calculateRelevanceScore()` to prioritize specific signals—for example, increasing the weight of conversation relevance in chat-heavy applications or emphasizing backend quality in AI-enriched environments.