performance

X-Agent-ID Scoping Strategy Performance Implications in MCP Memory Service

February 28, 2026 doobidoo/mcp-memory-service ↗

The X-Agent-ID header introduces negligible write-time overhead but forces linear scan behavior during retrieval because agent tags are stored as comma-separated text without database indexing.

The X-Agent-ID header in the doobidoo/mcp-memory-service repository provides automatic namespace isolation for multi-agent deployments. This scoping strategy appends an agent:<id> tag to every memory at ingestion time, creating logical partitions without separate database schemas. Understanding the performance characteristics of this approach is critical for deployments handling millions of memories across multiple agents.

Write-Time Performance Impact

Automatic Tag Injection and Deduplication

The header processing occurs in-process within the POST /api/memories endpoint before calling MemoryService.store_memory. When the middleware detects an X-Agent-ID header, it prepends the agent:<id> namespace tag—defined as NAMESPACE_AGENT in src/mcp_memory_service/models/tag_taxonomy.py—to the request's tag list. This operation consists of a single string concatenation and an in check to prevent duplicates, resulting in O(1) time complexity and minimal memory overhead.

The deduplication logic lives in src/mcp_memory_service/web/api/memories.py (lines 58-65). Before appending, the code verifies that the agent: prefix does not already exist in the tag array, ensuring idempotent behavior even if clients manually include the agent identifier. The unit test suite validates this behavior in tests/web/api/test_memories_api.py (lines 36-52).

When storing a memory with automatic agent scoping:

import requests

# The X-Agent-ID header causes the service to add `agent:researcher`

response = requests.post(
    "http://localhost:8000/api/memories",
    json={"content": "Findings about rate limits", "tags": ["api", "rate-limit"]},
    headers={"X-Agent-ID": "researcher"},
)
print(response.json()["memory"]["tags"])

# → ["api", "rate-limit", "agent:researcher"]

Read Performance Characteristics

GLOB Pattern Matching and Full Table Scans

Tags are persisted as a comma-separated text field across all storage backends (SQLite-Vec, Hybrid, Cloudflare). In src/mcp_memory_service/storage/sqlite_vec.py (lines 28-46), the search_by_tag function constructs a GLOB pattern query to locate specific agent memories:

(',' || REPLACE(tags, ' ', '') || ',') GLOB '*,agent:researcher,*'

This pattern matching enables substring searches within the concatenated tag string but prevents the database engine from utilizing traditional B-Tree indexes on individual tag values. Because the tags column contains comma-delimited strings rather than normalized values, the WHERE clause cannot leverage indexed lookups.

Linear Scan Behavior

The query engine must perform a full table scan (or at minimum, a scan of all rows matching other indexed predicates like deleted_at IS NULL). Performance degrades linearly with the total row count in the memories table, regardless of the specific agent: value being queried. A query for agent:researcher scans the same number of rows as a query for agent:analyst or any other tag.

Application-Level Filtering Benefits

While the database scan remains exhaustive, the agent: namespace logically partitions result sets. When retrieving only the researcher’s memories:

import requests

# Explicitly filter on the agent tag

response = requests.post(
    "http://localhost:8000/api/search/by-tag",
    json={"tags": ["agent:researcher"]},
)
for mem in response.json()["memories"]:
    print(mem["content"])

The server still scans all rows, but the in-memory filtering step discards non-matching records before serialization. This reduces network transfer and deserialization overhead when an agent's memories represent a small fraction of the global dataset.

Storage Overhead and Scalability Limits

Byte-Level Overhead

Adding the agent: tag increases row size by only a few bytes per memory. Even at millions of rows, this overhead remains negligible compared to the storage requirements for content payloads and vector embeddings (typically 384-1536 dimensions of floating-point data).

Indexing Strategies for Scale

For high-volume deployments, the current scan-based approach creates a bottleneck as memory counts grow. Implementing a dedicated tag index—such as a many-to-many relationship table or SQLite FTS5/JSON1 extensions—would convert the linear scan into an indexed lookup. This optimization would eliminate the primary performance constraint of the X-Agent-ID scoping strategy for large-scale installations.

Summary

Write latency remains essentially unchanged at O(1) complexity with minimal CPU and memory overhead.
Read latency scales linearly with total memory count due to comma-separated tag storage and GLOB pattern matching without index support.
Storage overhead is trivial, adding only a few bytes per row regardless of dataset size.
Scalability limits require attention for high-scale deployments; dedicated tag indexing is recommended for installations exceeding modest memory counts.

Frequently Asked Questions

Does X-Agent-ID slow down memory ingestion?

No. The header processing adds a single string concatenation and duplicate check before calling MemoryService.store_memory. This O(1) operation has negligible impact on ingestion throughput, as verified by the deduplication tests in tests/web/api/test_memories_api.py (lines 36-52).

Why are agent-scoped queries slow on large datasets?

The tags column stores comma-separated values as plain text, forcing the database to execute GLOB pattern matching across all rows. Without a separate index table or FTS extension, SQLite cannot perform indexed lookups on individual tags, resulting in full table scans that scale linearly with the total number of memories stored.

Can I improve query performance without changing the schema?

Partially. While you cannot eliminate the underlying scan without schema modifications, you can reduce application-level latency by ensuring all retrieval requests include explicit tags=["agent:<id>"] filters. This minimizes the result set transferred to the client when the agent's data represents a small subset of the total repository.

What is the recommended approach for high-scale multi-agent deployments?

For production systems handling millions of memories, migrate to a dedicated tag indexing strategy. Implementing a many-to-many relationship table between memories and tags, or utilizing SQLite's JSON1/FTS5 extensions, would enable sub-linear query performance and eliminate the current scan bottleneck described in src/mcp_memory_service/storage/sqlite_vec.py.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how doobidoo/mcp-memory-service works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →