How to Use Rerankers in Mem0 to Improve Memory Retrieval Accuracy
Mem0 supports pluggable rerankers—including Cohere, Sentence-Transformers, and LLM-based options—that reorder raw vector search results to surface the most relevant memories.
Mem0 is an open-source memory layer for AI applications that stores and retrieves contextual information using vector stores and optional graph structures. While vector similarity search efficiently retrieves candidate memories, the initial results may not always rank by true semantic relevance to the query. By configuring rerankers in Mem0, you can apply a secondary scoring pass that significantly improves retrieval precision without modifying the underlying search implementation.
Architecture of Rerankers in Mem0
The reranking system follows a factory pattern with clear separation between configuration, instantiation, and execution.
Core Components
MemoryConfig– Located inmem0/configs/base.py, this dataclass holds an optionalrerankerfield that specifies which provider to use and its configuration.RerankerConfig– Defined inmem0/configs/rerankers/config.py, this generic container normalizes provider-specific settings into a standard format consumed by the factory.RerankerFactory– Implemented inmem0/utils/factory.py(lines 240–283), this utility maps provider names (e.g.,"cohere","sentence_transformer") to concrete class imports and instantiates them with the appropriate configuration.BaseReranker– The abstract interface inmem0/reranker/base.pyguarantees that every implementation exposes a uniformrerank(query, documents, top_k)method.- Concrete Implementations – Provider-specific classes such as
CohereReranker(mem0/reranker/cohere_reranker.py),SentenceTransformerReranker(mem0/reranker/sentence_transformer_reranker.py), andLLMReranker(mem0/reranker/llm_reranker.py) handle the actual scoring logic.
Retrieval Flow
When you call Memory.search(), the process unfolds as follows:
- Initial Retrieval – The system queries the configured vector store (and optional graph store) to fetch an initial candidate list.
- Conditional Reranking – In
mem0/memory/main.py(lines 45–52), the code checks ifrerank=True(the default) and ifself.rerankerexists. If both conditions pass, it invokesself.reranker.rerank(query, original_memories, limit). - Result Enhancement – The reranker returns the same document list augmented with a
rerank_scorefield, sorted by this new relevance metric. If the reranking call fails due to network issues or missing credentials, the system catches the exception and returns the original results with a default score of0.0, ensuring the search never crashes.
Supported Reranker Providers
Mem0 ships with built-in support for five distinct reranking strategies, each suited to different deployment environments and latency requirements.
| Provider | Required Dependencies | Key Configuration Parameters |
|---|---|---|
| cohere | cohere Python package + COHERE_API_KEY environment variable |
model, top_k, return_documents, max_chunks_per_doc |
| sentence_transformer | sentence-transformers |
model_name, top_k |
| huggingface | transformers + torch |
model, top_k |
| zero_entropy | zeroentropy |
model, top_k |
| llm_reranker | Configured LLM from mem0.llms |
temperature, max_tokens, top_k |
Provider-specific configuration classes reside in mem0/configs/rerankers/<provider>.py, allowing you to fine-tune API timeouts, model versions, and batch sizes.
Configuring Rerankers
To enable reranking, pass a reranker dictionary to MemoryConfig during client initialization. The dictionary requires a provider string and a config object containing provider-specific settings.
Cohere Reranker Setup
The Cohere provider is ideal for production deployments requiring state-of-the-art neural reranking without local GPU resources.
from mem0 import Memory, MemoryConfig
from mem0.configs.rerankers.cohere import CohereRerankerConfig
# Configure the reranker
cohere_config = CohereRerankerConfig(
model="rerank-english-v2.0",
top_k=5,
return_documents=True,
max_chunks_per_doc=10,
api_key="YOUR_COHERE_API_KEY" # Optional: falls back to COHERE_API_KEY env var
)
# Initialize Memory with reranker enabled
config = MemoryConfig(
reranker={
"provider": "cohere",
"config": cohere_config.model_dump()
}
)
mem = Memory(config)
Local Sentence-Transformer Reranker
For privacy-sensitive applications or offline environments, use the Sentence-Transformer provider to run cross-encoder models locally.
from mem0 import Memory, MemoryConfig
from mem0.configs.rerankers.sentence_transformer import SentenceTransformerRerankerConfig
st_config = SentenceTransformerRerankerConfig(
model_name="cross-encoder/ms-marco-MiniLM-L-12-v2",
top_k=5
)
config = MemoryConfig(
reranker={
"provider": "sentence_transformer",
"config": st_config.model_dump()
}
)
mem = Memory(config)
Performing Reranked Searches
Once configured, reranking operates transparently during search operations. The rerank parameter defaults to True, but you can disable it for latency-sensitive queries where approximate vector similarity is sufficient.
# Add memories to the store
mem.add([{"role": "user", "content": "I love playing soccer on weekends"}])
mem.add([{"role": "user", "content": "My favorite food is Italian pasta"}])
# Search with reranking enabled (default behavior)
results = mem.search(
query="What are the user's hobbies?",
user_id="user123",
rerank=True, # Explicitly enable; omit to use default
limit=10
)
# Access reranked scores
for item in results["results"]:
print(f"Memory: {item['memory']}")
print(f"Rerank Score: {item.get('rerank_score', 'N/A')}")
print("---")
The limit parameter applies to both the initial vector search and the reranking stage. The reranker receives the full candidate set up to limit, then returns the top-k most relevant items based on the provider's scoring model.
Error Handling and Resilience
Rerankers in Mem0 implement graceful degradation. If the external API times out, the API key is invalid, or the local model fails to load, the reranker catches the exception and returns the original unranked results with a rerank_score of 0.0 for each document. This design ensures that memory retrieval remains functional even when auxiliary reranking services are unavailable.
Summary
- Rerankers in Mem0 are configured via
MemoryConfigusing provider-specific config classes imported frommem0.configs.rerankers. - The
RerankerFactoryinmem0/utils/factory.pyinstantiates the correct implementation based on theproviderstring. - Reranking occurs automatically in
Memory.search()(defined inmem0/memory/main.py) whenrerank=Trueand a reranker is configured. - Supported providers include Cohere, Sentence-Transformers, HuggingFace, Zero-Entropy, and LLM-based rerankers, each requiring different dependencies and API keys.
- The system includes built-in fallback logic that preserves search functionality if reranking services fail.
Frequently Asked Questions
How do I disable reranking for specific queries?
Pass rerank=False to the Memory.search() method. By default, Mem0 attempts to rerank results whenever a reranker is configured, but you can override this per-query to reduce latency for non-critical retrievals.
Can I use multiple rerankers simultaneously?
No, the current architecture in mem0/configs/base.py supports a single reranker configuration per Memory instance. To compare different providers, initialize separate Memory clients with distinct configurations and run A/B tests on your dataset.
What happens if my Cohere API key expires during a search?
The CohereReranker implementation catches authentication and network errors during the rerank() call. It logs the failure and returns the original vector search results with a default score of 0.0, ensuring your application continues operating without interruption.
Do I need a GPU for local rerankers?
Not necessarily. The sentence_transformer and huggingface providers can run on CPU, though GPU acceleration significantly improves latency for large document sets. Configure the device parameter in your provider-specific config (where supported) to control hardware utilization.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →