deep-dive

How Hybrid Search in MCP Memory Service Combines BM25 and Vector Search for Better Recall

February 28, 2026 doobidoo/mcp-memory-service ↗

Hybrid search in MCP Memory Service runs BM25 keyword matching and vector similarity searches in parallel, normalizes both score types to a 0-1 range, and fuses them using configurable weights to surface results that match either specific terms or semantic meaning.

Hybrid search addresses the limitations of pure semantic retrieval by incorporating classic full-text ranking. In the doobidoo/mcp-memory-service repository, this approach ensures that rare keywords, proper nouns, and exact phrases receive appropriate weight alongside embedding-based similarity, significantly improving recall for diverse query types.

Architecture of Hybrid Search in MCP Memory Service

The hybrid retrieval system is implemented primarily in src/mcp_memory_service/storage/sqlite_vec.py, where the retrieve_hybrid method orchestrates a multi-stage pipeline that balances keyword precision with semantic flexibility.

Configuration and Environment Variables

Hybrid mode is enabled by default via the MCP_HYBRID_SEARCH_ENABLED=True setting defined in src/mcp_memory_service/config.py. The fusion weights are also configured here, with MCP_HYBRID_KEYWORD_WEIGHT defaulting to 0.3 and MCP_HYBRID_SEMANTIC_WEIGHT defaulting to 0.7. These values determine the influence of each signal in the final ranking and can be overridden per query.

Parallel Retrieval Strategy

When retrieve_hybrid is invoked, it spawns two concurrent asyncio tasks to maximize throughput:

BM25 FTS5 query: Executed via _search_bm25, which queries the SQLite FTS5 virtual table for keyword matches.
Vector similarity query: Executed via the standard retrieve method, which performs cosine similarity search against the vector index.

Both tasks request double the requested result count (n_results × 2) to ensure sufficient candidate overlap for the subsequent merge phase. This parallel execution prevents the slower of the two searches from becoming a bottleneck.

Score Normalization Logic

Before fusion, scores from each backend must be converted to a comparable 0-1 scale.

BM25 normalization occurs in _normalize_bm25_score. Since SQLite FTS5 returns a negative rank where values closer to zero indicate higher relevance, the method applies the transformation:

normalized_score = max(0, min(1, 1 + rank / 10))

This compresses the negative rank into a positive 0-1 range where 1.0 represents the best keyword match.

Vector normalization converts cosine distance to similarity. The vector search returns distances where 0 indicates identical vectors and 2 indicates opposite vectors. The code transforms this in the retrieve method using:

semantic_score = 1 - (distance / 2)

This yields a 0-1 similarity score suitable for weighted combination with BM25 results.

Weighted Score Fusion and Deduplication

The _fuse_scores method (lines 1612-1635 in sqlite_vec.py) combines the normalized scores using a weighted average:

final_score = (keyword_score * keyword_weight) + (semantic_score * semantic_weight)

Results are deduplicated by content_hash. If a memory appears in only one result set, the missing score defaults to 0.0, and the full record is lazy-loaded from SQLite via get_by_hash. The fused results are sorted descending by score and sliced to the requested n_results limit.

Each MemoryQueryResult object includes a debug_info dictionary containing the raw keyword_score, semantic_score, and backend identifier "hybrid-bm25-vector", enabling observability and performance tuning.

Why Hybrid Search Improves Recall

The complementary strengths of BM25 and vector search address distinct failure modes in single-modality retrieval:

Keyword boost: Rare terms or proper nouns that BM25 ranks highly will elevate the fused score even when vector embeddings produce mediocre similarity scores. This ensures exact matches are not buried by semantic noise.
Semantic safety net: When queries contain synonyms, paraphrases, or conceptual descriptions that lack exact keyword matches, the vector similarity component maintains a solid base score, preventing relevant memories from being excluded.
Configurable balance: Domain-specific adjustments are possible by tuning weights. For example, code search implementations might increase keyword_weight to prioritize function names and specific syntax.

The integration tests in tests/storage/test_hybrid_search.py validate that this fusion approach yields higher scores for exact keyword hits while retaining semantic relevance for conceptual queries.

Implementing Hybrid Search in Your Code

The following examples demonstrate practical usage of the hybrid retrieval API.

Basic Hybrid Query

from mcp_memory_service.storage.sqlite_vec import SqliteVecMemoryStorage
from mcp_memory_service.models import Memory
from mcp_memory_service.utils import generate_content_hash

# Initialize storage

storage = SqliteVecMemoryStorage(db_path="memories.db")
await storage.initialize()

# Store sample memory

mem = Memory(
    content="GraphQL API endpoint for fetching user profiles",
    content_hash=generate_content_hash("GraphQL API endpoint for fetching user profiles"),
    tags=["api", "graphql"]
)
await storage.store(mem)

# Execute hybrid search with default weights (0.3 keyword, 0.7 semantic)

results = await storage.retrieve_hybrid("GraphQL", n_results=5)
for r in results:
    print(f"{r.memory.content} | Score: {r.relevance_score:.3f} "
          f"(kw={r.debug_info['keyword_score']:.2f}, "
          f"sem={r.debug_info['semantic_score']:.2f})")

Custom Weight Configuration

Override default weights to prioritize keyword matching for controlled vocabularies:

results = await storage.retrieve_hybrid(
    "GraphQL", 
    n_results=5,
    keyword_weight=0.6,   # 60% keyword influence

    semantic_weight=0.4   # 40% semantic influence

)

Unified API Access

The service's search_memories endpoint forwards the mode="hybrid" flag to retrieve_hybrid automatically:

await storage.search_memories(
    query="GraphQL", 
    mode="hybrid", 
    limit=5
)

Summary

Hybrid search in MCP Memory Service combines BM25 full-text search with dense vector similarity to maximize result recall.
The system retrieves double the requested results from both backends in parallel to ensure sufficient candidate overlap.
BM25 scores are normalized from negative ranks to 0-1 using max(0, min(1, 1 + rank/10)) in _normalize_bm25_score.
Vector distances are converted to similarity scores using 1 - (distance/2) before fusion.
Final ranking uses a weighted average (default 0.3 keyword, 0.7 semantic) computed in _fuse_scores.
Results are deduplicated by content hash and include detailed debug_info for observability.
Weights are configurable per query or via environment variables in src/mcp_memory_service/config.py.

Frequently Asked Questions

What is the default weight distribution between BM25 and vector search in MCP Memory Service?

The default configuration assigns a weight of 0.3 to keyword scores (BM25) and 0.7 to semantic scores (vector similarity). These values are defined in src/mcp_memory_service/config.py via MCP_HYBRID_KEYWORD_WEIGHT and MCP_HYBRID_SEMANTIC_WEIGHT, and can be overridden for individual queries or globally via environment variables.

How are BM25 scores normalized in MCP Memory Service?

BM25 scores are normalized in the _normalize_bm25_score method within src/mcp_memory_service/storage/sqlite_vec.py. Since SQLite FTS5 returns negative ranks where values closer to zero indicate better matches, the code applies the formula max(0, min(1, 1 + rank/10)) to transform these into a standard 0-1 similarity scale.

Can I adjust the hybrid search weights per query?

Yes, the retrieve_hybrid method accepts optional keyword_weight and semantic_weight parameters that override the global defaults. This allows fine-tuning for specific use cases, such as increasing keyword weight for exact code searches or boosting semantic weight for conceptual natural language queries.

Where is the hybrid search logic implemented in the codebase?

The core implementation resides in src/mcp_memory_service/storage/sqlite_vec.py, specifically within the retrieve_hybrid, _search_bm25, _normalize_bm25_score, and related fusion methods. Configuration defaults are managed in src/mcp_memory_service/config.py, and the test suite in tests/storage/test_hybrid_search.py validates the normalization and weighting logic.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how doobidoo/mcp-memory-service works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →