How the conversation_id Parameter Bypasses Semantic Deduplication for Incremental Storage in MCP Memory Service

When you provide a conversation_id in a memory creation request, the MCP Memory Service converts this identifier into a boolean flag that disables semantic similarity checks, allowing sequentially related memories to be stored even when their content is topically similar.

The mcp-memory-service repository implements an intelligent deduplication system that prevents redundant storage of semantically similar content. However, this default behavior can interfere with legitimate use cases like conversation logging or incremental note-taking, where related entries naturally share semantic similarity. The conversation_id parameter provides a targeted mechanism to bypass these checks while maintaining exact-hash deduplication protection.

Understanding Semantic Deduplication in MCP Memory Service

By default, the storage backend performs aggressive semantic deduplication to prevent the accumulation of near-duplicate memories. This system analyzes the semantic similarity of incoming content against recently stored entries within a configurable time window.

The Default Deduplication Guard

In src/mcp_memory_service/storage/sqlite_vec.py, the storage engine implements a guard clause that checks for semantic duplicates before persisting new content:

if self.semantic_dedup_enabled and not skip_semantic_dedup:
    is_duplicate, existing_hash = await self._check_semantic_duplicate(
        memory.content,
        time_window_hours=self.semantic_dedup_time_window,
        similarity_threshold=self.semantic_dedup_threshold,
    )
    if is_duplicate:
        return False, f"Duplicate content detected (semantically similar to {existing_hash})"

When skip_semantic_dedup is False (the default), this block executes the similarity check and rejects content that exceeds the similarity threshold, even if the text is not identical.

How conversation_id Triggers the Bypass Mechanism

The bypass mechanism operates through a coordinated transformation across the service and storage layers. When present, the conversation_id parameter cascades through the system as a boolean flag that disables semantic checks while preserving the identifier for later retrieval.

Service Layer Conversion

In src/mcp_memory_service/services/memory_service.py (lines 88-92), the store_memory method processes the incoming request and converts the conversation_id into a deduplication control flag:


# Convert conversation_id to boolean skip flag

skip_dedup = bool(conversation_id)
if skip_dedup:
    final_metadata["conversation_id"] = conversation_id

# Pass to storage backend

success, message = await self.storage.store(
    memory, skip_semantic_dedup=skip_dedup
)

The bool(conversation_id) conversion ensures that any non-empty string evaluates to True, triggering the bypass. Simultaneously, the method injects the conversation identifier into the memory's metadata dictionary, enabling future queries to filter or group by this ID.

Storage Backend Implementation

The SQLite-vec backend receives the skip_semantic_dedup parameter and uses it to conditionally execute the similarity check. As shown in the guard clause from src/mcp_memory_service/storage/sqlite_vec.py (lines 1192-1195), when skip_semantic_dedup is True, the condition not skip_semantic_dedup evaluates to False, causing the entire semantic duplicate check block to be skipped.

The system still performs exact-hash deduplication regardless of the conversation_id presence, preventing truly identical content from being stored twice.

Practical Implementation Examples

API Request with conversation_id

When creating memories via the REST API, include the conversation_id field in the JSON payload:

POST /memories
{
  "content": "Claude Code is a powerful CLI tool for software engineering.",
  "tags": ["dev", "ai-tools"],
  "conversation_id": "conv-12345"
}

This request will bypass semantic similarity checks, allowing subsequent similar entries to be stored as part of the same conversation thread.

Python Client Implementation

Using the service layer directly:

from mcp_memory_service.services.memory_service import MemoryService

service = MemoryService(storage_backend)

# First entry in conversation

await service.store_memory(
    content="Initial thoughts on the architecture...",
    metadata={"topic": "design"},
    conversation_id="session-abc-789"
)

# Similar follow-up (normally rejected, but allowed here)

await service.store_memory(
    content="Additional thoughts on the architecture and implementation details...",
    metadata={"topic": "design"},
    conversation_id="session-abc-789"
)

Both memories persist despite semantic similarity because they share the same conversation_id.

Verifying conversation_id Persistence

Confirm that the identifier is properly stored in metadata:


# Retrieve stored memory by hash

memory = await service.storage.get_by_hash(content_hash)

# Verify conversation linkage

assert memory.metadata.get("conversation_id") == "session-abc-789"

Summary

  • The conversation_id parameter in mcp-memory-service enables incremental storage of semantically similar memories by bypassing the default semantic deduplication system.
  • In src/mcp_memory_service/services/memory_service.py, the service converts the presence of a conversation_id into a boolean skip_dedup flag and passes it to the storage layer as skip_semantic_dedup.
  • The SQLite-vec backend in src/mcp_memory_service/storage/sqlite_vec.py uses this flag to conditionally skip the _check_semantic_duplicate routine while maintaining exact-hash deduplication protection.
  • The conversation identifier is persisted in the memory's metadata under the "conversation_id" key, enabling future retrieval and grouping operations.

Frequently Asked Questions

What happens if I provide an empty string as the conversation_id?

An empty string evaluates to False in Python's boolean context, so bool(conversation_id) returns False. Consequently, the semantic deduplication check proceeds normally, and the empty string is not added to the metadata. To trigger the bypass, you must provide a non-empty string identifier.

Does using conversation_id disable all forms of deduplication?

No. The conversation_id parameter only bypasses semantic deduplication, which checks for similarity using vector embeddings. The system still performs exact-hash deduplication to prevent truly identical content from being stored twice, regardless of the conversation_id value.

Can I retrieve all memories belonging to a specific conversation?

Yes. Since the conversation_id is stored in the memory's metadata under the "conversation_id" key, you can query the storage backend filtering by this metadata field. The service layer preserves this identifier specifically to enable conversation-based retrieval and grouping operations.

Is there a performance impact when using conversation_id for many sequential writes?

Using conversation_id actually improves performance for sequential similar writes because it skips the expensive semantic similarity check (which involves vector embedding comparison). However, the system still computes content hashes for exact deduplication, so the overhead is minimal compared to full semantic analysis.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →