deep-dive

MCP Memory Service Docker Deployment Modes: Slim, Glama, and Standalone Explained

February 28, 2026 doobidoo/mcp-memory-service ↗

MCP Memory Service provides three distinct Docker deployment modes—Slim for minimal CPU-only footprints, Glama for cloud-native platforms requiring native build tools, and Standalone for independent server operation—each optimized via different Dockerfiles and environment configurations in the doobidoo/mcp-memory-service repository.

The containerization strategy for this memory-enabled MCP server varies significantly based on whether you prioritize image size, cloud platform compatibility, or autonomous operation. Each mode leverages specific base images, dependency sets, and runtime flags to accommodate constraints ranging from edge device limitations to Glama cloud hosting requirements.

Base Images and System Architecture

The foundation of each deployment mode begins with divergent base image selections and system package installations.

Slim mode utilizes python:3.10-slim (Debian-based) and installs only curl, maintaining the smallest possible attack surface and image size. In contrast, Glama mode uses the same python:3.10-slim base but adds build-essential, gcc, and g++ (lines 22-28 in tools/docker/Dockerfile.glama) to compile native extensions required by the Glama runtime environment.

The primary Dockerfile (tools/docker/Dockerfile), typically used for Standalone deployments, upgrades to python:3.12-slim and installs curl, bash, and system upgrades (lines 24-32), offering a newer interpreter while remaining lightweight compared to Glama.

Dependency Sets and ML Runtime Differences

The choice between ONNX and PyTorch represents the most significant functional divergence between modes.

Slim mode enforces CPU-only ONNX runtime exclusively. The Dockerfile explicitly sets MCP_MEMORY_USE_ONNX=1 (line 11 in tools/docker/Dockerfile.slim) and installs a fixed, minimal dependency list (lines 52-71) excluding PyTorch entirely. This results in a deterministic, minimal footprint suitable for CI pipelines and edge devices.

Glama mode does not force ONNX usage. Instead, it configures PYTORCH_ENABLE_MPS_FALLBACK=1 (line 17 in tools/docker/Dockerfile.glama) to enable Apple Silicon GPU fallback, installing dependencies via editable mode (uv pip install --system -e . at line 44) to respect the project's pyproject.toml.

The primary Dockerfile offers flexibility through build arguments. When FORCE_CPU_PYTORCH=true or when installing the [sqlite] extra group (lines 59-62), it pulls CPU-only PyTorch while maintaining ONNX runtime availability. This allows Standalone deployments to toggle between pure-CPU stacks and full PyTorch installations based on hardware constraints.

Environment Variables and Configuration

Each mode configures distinct environment variables that drive runtime behavior:

Slim: Forces MCP_MEMORY_STORAGE_BACKEND=sqlite_vec and MCP_MEMORY_USE_ONNX=1, optimizing for local SQLite-backed storage without GPU dependencies.
Glama: Defines MCP_MEMORY_CHROMA_PATH=/app/chroma_db for persistent vector storage and omits ONNX forcing, allowing framework flexibility.
Standalone: Activates via MCP_STANDALONE_MODE=1 (set in tools/docker/docker-compose.standalone.yml, line 22), which triggers the startup orchestrator to keep the process alive without an attached MCP client.

The Standalone mode specifically references src/mcp_memory_service/utils/startup_orchestrator.py (lines 153-166), where the guard logic detects MCP_STANDALONE_MODE to prevent container shutdown when no client connection exists.

Runtime Modes and Compose Orchestration

The repository provides three compose files corresponding to different operational patterns:

MCP Protocol Mode (tools/docker/docker-compose.yml): Configures stdin_open: true and tty: true (lines 14-15) for bidirectional pipe communication with Claude Desktop or VS Code extensions. Sets MCP_MODE=mcp (line 27) for protocol-compliant operation.

HTTP/API Mode (tools/docker/docker-compose.http.yml): Exposes port ${HTTP_PORT:-8000} (line 12) and sets MCP_MODE=http with REST-specific variables (MCP_HTTP_PORT, MCP_API_KEY). Includes Dockerfile health checks (lines 61-66) for load balancer compatibility.

Standalone Mode (tools/docker/docker-compose.standalone.yml): Mounts the source tree (.:/app) for development and sets MCP_STANDALONE_MODE=1 (line 22), allowing the service to run as a headless server without MCP client attachment.

All Dockerfiles utilize a unified entrypoint system (docker-entrypoint-unified.sh), though Glama notes a default command for standalone compatibility (line 66 in tools/docker/Dockerfile.glama) with an empty CMD [] instruction, deferring runtime decisions to environment variables.

Practical Deployment Examples

Running Slim Mode (Minimal CPU-Only)

docker build -f tools/docker/Dockerfile.slim -t mcp-memory-service:slim .
docker run -p 8000:8000 \
  -e MCP_MEMORY_STORAGE_BACKEND=sqlite_vec \
  -e MCP_MEMORY_USE_ONNX=1 \
  -v $(pwd)/data:/app/sqlite_db \
  mcp-memory-service:slim

Deploying Glama Mode (Cloud-Optimized)

docker build -f tools/docker/Dockerfile.glama -t mcp-memory-service:glama .
docker run -p 8000:8000 \
  -e MCP_MEMORY_CHROMA_PATH=/app/chroma_db \
  -v $(pwd)/chroma_db:/app/chroma_db \
  mcp-memory-service:glama

Starting Standalone Mode (Independent Server)

docker compose -f tools/docker/docker-compose.standalone.yml up -d

Or via direct Docker command using the primary Dockerfile:

docker build -f tools/docker/Dockerfile -t mcp-memory-service .
docker run -p 8000:8000 \
  -e MCP_STANDALONE_MODE=1 \
  -v $(pwd)/data:/app/data \
  mcp-memory-service

Summary

Slim mode prioritizes minimal image size using python:3.10-slim, ONNX-only embeddings, and fixed dependencies via tools/docker/Dockerfile.slim.
Glama mode adds native compilation tools (gcc, g++, build-essential) and cloud-specific health checks in tools/docker/Dockerfile.glama for platform compatibility.
Standalone mode operates through MCP_STANDALONE_MODE=1 environment variable in tools/docker/docker-compose.standalone.yml, enabling autonomous operation without MCP client connections.
The primary tools/docker/Dockerfile supports flexible PyTorch installation via FORCE_CPU_PYTORCH build arguments for customizable Standalone deployments.
All modes utilize the unified entrypoint script (docker-entrypoint-unified.sh) but differ in base image versions, dependency resolution strategies, and runtime environment variables.

Frequently Asked Questions

What is the smallest Docker image size available for MCP Memory Service?

The Slim mode produces the smallest footprint by using python:3.10-slim as the base image, stripping PyTorch entirely, and relying exclusively on ONNX runtime for embeddings. This configuration, defined in tools/docker/Dockerfile.slim, is optimal for CI/CD pipelines and resource-constrained edge devices where every megabyte matters.

Can I use PyTorch instead of ONNX in the Glama deployment mode?

Yes. Unlike Slim mode which forces MCP_MEMORY_USE_ONNX=1, the Glama Dockerfile (tools/docker/Dockerfile.glama) does not enforce ONNX usage. It sets PYTORCH_ENABLE_MPS_FALLBACK=1 to support Apple Silicon GPU fallback and installs dependencies via editable mode, allowing the full PyTorch stack to resolve from pyproject.toml without CPU-only restrictions.

How does Standalone mode keep the container running without an MCP client?

Standalone mode sets the environment variable MCP_STANDALONE_MODE=1, which the startup orchestrator detects in src/mcp_memory_service/utils/startup_orchestrator.py (lines 153-166). This flag instructs the service to bypass the standard MCP client connection requirement, keeping the main process alive for HTTP API access or background processing even when no client maintains a stdin pipe connection.

Which Dockerfile should I use for production HTTP API deployments?

Use the primary tools/docker/Dockerfile with tools/docker/docker-compose.http.yml for production HTTP deployments. This combination exposes port 8000, configures MCP_MODE=http, and supports optional CPU-only PyTorch via FORCE_CPU_PYTORCH build arguments. The HTTP compose file includes health checks (lines 61-66) essential for production load balancer integration.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how doobidoo/mcp-memory-service works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →