# How Hybrid Search in MCP Memory Service Combines BM25 and Vector Search for Better Recall

> Discover how hybrid search in MCP Memory Service unites BM25 and vector search to boost recall. Learn how parallel searches, score normalization, and weighted fusion deliver superior results for term and semantic matches.

- Repository: [Henry/mcp-memory-service](https://github.com/doobidoo/mcp-memory-service)
- Tags: deep-dive
- Published: 2026-02-28

---

**Hybrid search in MCP Memory Service runs BM25 keyword matching and vector similarity searches in parallel, normalizes both score types to a 0-1 range, and fuses them using configurable weights to surface results that match either specific terms or semantic meaning.**

Hybrid search addresses the limitations of pure semantic retrieval by incorporating classic full-text ranking. In the `doobidoo/mcp-memory-service` repository, this approach ensures that rare keywords, proper nouns, and exact phrases receive appropriate weight alongside embedding-based similarity, significantly improving recall for diverse query types.

## Architecture of Hybrid Search in MCP Memory Service

The hybrid retrieval system is implemented primarily in [`src/mcp_memory_service/storage/sqlite_vec.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/storage/sqlite_vec.py), where the `retrieve_hybrid` method orchestrates a multi-stage pipeline that balances keyword precision with semantic flexibility.

### Configuration and Environment Variables

Hybrid mode is enabled by default via the `MCP_HYBRID_SEARCH_ENABLED=True` setting defined in [`src/mcp_memory_service/config.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/config.py). The fusion weights are also configured here, with `MCP_HYBRID_KEYWORD_WEIGHT` defaulting to **0.3** and `MCP_HYBRID_SEMANTIC_WEIGHT` defaulting to **0.7**. These values determine the influence of each signal in the final ranking and can be overridden per query.

### Parallel Retrieval Strategy

When `retrieve_hybrid` is invoked, it spawns two concurrent **asyncio** tasks to maximize throughput:

1. **BM25 FTS5 query**: Executed via `_search_bm25`, which queries the SQLite FTS5 virtual table for keyword matches.
2. **Vector similarity query**: Executed via the standard `retrieve` method, which performs cosine similarity search against the vector index.

Both tasks request **double** the requested result count (`n_results × 2`) to ensure sufficient candidate overlap for the subsequent merge phase. This parallel execution prevents the slower of the two searches from becoming a bottleneck.

### Score Normalization Logic

Before fusion, scores from each backend must be converted to a comparable 0-1 scale.

**BM25 normalization** occurs in `_normalize_bm25_score`. Since SQLite FTS5 returns a negative rank where values closer to zero indicate higher relevance, the method applies the transformation:

```python
normalized_score = max(0, min(1, 1 + rank / 10))

```

This compresses the negative rank into a positive 0-1 range where 1.0 represents the best keyword match.

**Vector normalization** converts cosine distance to similarity. The vector search returns distances where 0 indicates identical vectors and 2 indicates opposite vectors. The code transforms this in the `retrieve` method using:

```python
semantic_score = 1 - (distance / 2)

```

This yields a 0-1 similarity score suitable for weighted combination with BM25 results.

### Weighted Score Fusion and Deduplication

The `_fuse_scores` method (lines 1612-1635 in [`sqlite_vec.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/sqlite_vec.py)) combines the normalized scores using a weighted average:

```python
final_score = (keyword_score * keyword_weight) + (semantic_score * semantic_weight)

```

Results are deduplicated by `content_hash`. If a memory appears in only one result set, the missing score defaults to **0.0**, and the full record is lazy-loaded from SQLite via `get_by_hash`. The fused results are sorted descending by score and sliced to the requested `n_results` limit.

Each `MemoryQueryResult` object includes a `debug_info` dictionary containing the raw `keyword_score`, `semantic_score`, and backend identifier `"hybrid-bm25-vector"`, enabling observability and performance tuning.

## Why Hybrid Search Improves Recall

The complementary strengths of BM25 and vector search address distinct failure modes in single-modality retrieval:

- **Keyword boost**: Rare terms or proper nouns that BM25 ranks highly will elevate the fused score even when vector embeddings produce mediocre similarity scores. This ensures exact matches are not buried by semantic noise.
- **Semantic safety net**: When queries contain synonyms, paraphrases, or conceptual descriptions that lack exact keyword matches, the vector similarity component maintains a solid base score, preventing relevant memories from being excluded.
- **Configurable balance**: Domain-specific adjustments are possible by tuning weights. For example, code search implementations might increase `keyword_weight` to prioritize function names and specific syntax.

The integration tests in [`tests/storage/test_hybrid_search.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/tests/storage/test_hybrid_search.py) validate that this fusion approach yields higher scores for exact keyword hits while retaining semantic relevance for conceptual queries.

## Implementing Hybrid Search in Your Code

The following examples demonstrate practical usage of the hybrid retrieval API.

### Basic Hybrid Query

```python
from mcp_memory_service.storage.sqlite_vec import SqliteVecMemoryStorage
from mcp_memory_service.models import Memory
from mcp_memory_service.utils import generate_content_hash

# Initialize storage

storage = SqliteVecMemoryStorage(db_path="memories.db")
await storage.initialize()

# Store sample memory

mem = Memory(
    content="GraphQL API endpoint for fetching user profiles",
    content_hash=generate_content_hash("GraphQL API endpoint for fetching user profiles"),
    tags=["api", "graphql"]
)
await storage.store(mem)

# Execute hybrid search with default weights (0.3 keyword, 0.7 semantic)

results = await storage.retrieve_hybrid("GraphQL", n_results=5)
for r in results:
    print(f"{r.memory.content} | Score: {r.relevance_score:.3f} "
          f"(kw={r.debug_info['keyword_score']:.2f}, "
          f"sem={r.debug_info['semantic_score']:.2f})")

```

### Custom Weight Configuration

Override default weights to prioritize keyword matching for controlled vocabularies:

```python
results = await storage.retrieve_hybrid(
    "GraphQL", 
    n_results=5,
    keyword_weight=0.6,   # 60% keyword influence

    semantic_weight=0.4   # 40% semantic influence

)

```

### Unified API Access

The service's `search_memories` endpoint forwards the `mode="hybrid"` flag to `retrieve_hybrid` automatically:

```python
await storage.search_memories(
    query="GraphQL", 
    mode="hybrid", 
    limit=5
)

```

## Summary

- **Hybrid search** in MCP Memory Service combines BM25 full-text search with dense vector similarity to maximize result recall.
- The system retrieves **double the requested results** from both backends in parallel to ensure sufficient candidate overlap.
- **BM25 scores** are normalized from negative ranks to 0-1 using `max(0, min(1, 1 + rank/10))` in `_normalize_bm25_score`.
- **Vector distances** are converted to similarity scores using `1 - (distance/2)` before fusion.
- Final ranking uses a **weighted average** (default 0.3 keyword, 0.7 semantic) computed in `_fuse_scores`.
- Results are **deduplicated by content hash** and include detailed `debug_info` for observability.
- Weights are configurable per query or via environment variables in [`src/mcp_memory_service/config.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/config.py).

## Frequently Asked Questions

### What is the default weight distribution between BM25 and vector search in MCP Memory Service?

The default configuration assigns a weight of **0.3** to keyword scores (BM25) and **0.7** to semantic scores (vector similarity). These values are defined in [`src/mcp_memory_service/config.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/config.py) via `MCP_HYBRID_KEYWORD_WEIGHT` and `MCP_HYBRID_SEMANTIC_WEIGHT`, and can be overridden for individual queries or globally via environment variables.

### How are BM25 scores normalized in MCP Memory Service?

BM25 scores are normalized in the `_normalize_bm25_score` method within [`src/mcp_memory_service/storage/sqlite_vec.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/storage/sqlite_vec.py). Since SQLite FTS5 returns negative ranks where values closer to zero indicate better matches, the code applies the formula `max(0, min(1, 1 + rank/10))` to transform these into a standard 0-1 similarity scale.

### Can I adjust the hybrid search weights per query?

Yes, the `retrieve_hybrid` method accepts optional `keyword_weight` and `semantic_weight` parameters that override the global defaults. This allows fine-tuning for specific use cases, such as increasing keyword weight for exact code searches or boosting semantic weight for conceptual natural language queries.

### Where is the hybrid search logic implemented in the codebase?

The core implementation resides in [`src/mcp_memory_service/storage/sqlite_vec.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/storage/sqlite_vec.py), specifically within the `retrieve_hybrid`, `_search_bm25`, `_normalize_bm25_score`, and related fusion methods. Configuration defaults are managed in [`src/mcp_memory_service/config.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/config.py), and the test suite in [`tests/storage/test_hybrid_search.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/tests/storage/test_hybrid_search.py) validates the normalization and weighting logic.