# MCP Memory Service Docker Deployment Modes: Slim, Glama, and Standalone Explained

> Understand MCP Memory Service Docker deployment modes: Slim, Glama, and Standalone. Learn which mode suits your needs for efficient cloud or server operation. Explore key differences and choose the best fit.

- Repository: [Henry/mcp-memory-service](https://github.com/doobidoo/mcp-memory-service)
- Tags: deep-dive
- Published: 2026-02-28

---

**MCP Memory Service provides three distinct Docker deployment modes—Slim for minimal CPU-only footprints, Glama for cloud-native platforms requiring native build tools, and Standalone for independent server operation—each optimized via different Dockerfiles and environment configurations in the `doobidoo/mcp-memory-service` repository.**

The containerization strategy for this memory-enabled MCP server varies significantly based on whether you prioritize image size, cloud platform compatibility, or autonomous operation. Each mode leverages specific base images, dependency sets, and runtime flags to accommodate constraints ranging from edge device limitations to Glama cloud hosting requirements.

## Base Images and System Architecture

The foundation of each deployment mode begins with divergent base image selections and system package installations.

**Slim** mode utilizes `python:3.10-slim` (Debian-based) and installs only `curl`, maintaining the smallest possible attack surface and image size. In contrast, **Glama** mode uses the same `python:3.10-slim` base but adds `build-essential`, `gcc`, and `g++` (lines 22-28 in `tools/docker/Dockerfile.glama`) to compile native extensions required by the Glama runtime environment.

The **primary Dockerfile** (`tools/docker/Dockerfile`), typically used for Standalone deployments, upgrades to `python:3.12-slim` and installs `curl`, `bash`, and system upgrades (lines 24-32), offering a newer interpreter while remaining lightweight compared to Glama.

## Dependency Sets and ML Runtime Differences

The choice between ONNX and PyTorch represents the most significant functional divergence between modes.

**Slim** mode enforces CPU-only ONNX runtime exclusively. The Dockerfile explicitly sets `MCP_MEMORY_USE_ONNX=1` (line 11 in `tools/docker/Dockerfile.slim`) and installs a fixed, minimal dependency list (lines 52-71) excluding PyTorch entirely. This results in a deterministic, minimal footprint suitable for CI pipelines and edge devices.

**Glama** mode does not force ONNX usage. Instead, it configures `PYTORCH_ENABLE_MPS_FALLBACK=1` (line 17 in `tools/docker/Dockerfile.glama`) to enable Apple Silicon GPU fallback, installing dependencies via editable mode (`uv pip install --system -e .` at line 44) to respect the project's [`pyproject.toml`](https://github.com/doobidoo/mcp-memory-service/blob/main/pyproject.toml).

The **primary Dockerfile** offers flexibility through build arguments. When `FORCE_CPU_PYTORCH=true` or when installing the `[sqlite]` extra group (lines 59-62), it pulls CPU-only PyTorch while maintaining ONNX runtime availability. This allows Standalone deployments to toggle between pure-CPU stacks and full PyTorch installations based on hardware constraints.

## Environment Variables and Configuration

Each mode configures distinct environment variables that drive runtime behavior:

- **Slim**: Forces `MCP_MEMORY_STORAGE_BACKEND=sqlite_vec` and `MCP_MEMORY_USE_ONNX=1`, optimizing for local SQLite-backed storage without GPU dependencies.
- **Glama**: Defines `MCP_MEMORY_CHROMA_PATH=/app/chroma_db` for persistent vector storage and omits ONNX forcing, allowing framework flexibility.
- **Standalone**: Activates via `MCP_STANDALONE_MODE=1` (set in [`tools/docker/docker-compose.standalone.yml`](https://github.com/doobidoo/mcp-memory-service/blob/main/tools/docker/docker-compose.standalone.yml), line 22), which triggers the startup orchestrator to keep the process alive without an attached MCP client.

The Standalone mode specifically references [`src/mcp_memory_service/utils/startup_orchestrator.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/utils/startup_orchestrator.py) (lines 153-166), where the guard logic detects `MCP_STANDALONE_MODE` to prevent container shutdown when no client connection exists.

## Runtime Modes and Compose Orchestration

The repository provides three compose files corresponding to different operational patterns:

**MCP Protocol Mode** ([`tools/docker/docker-compose.yml`](https://github.com/doobidoo/mcp-memory-service/blob/main/tools/docker/docker-compose.yml)): Configures `stdin_open: true` and `tty: true` (lines 14-15) for bidirectional pipe communication with Claude Desktop or VS Code extensions. Sets `MCP_MODE=mcp` (line 27) for protocol-compliant operation.

**HTTP/API Mode** ([`tools/docker/docker-compose.http.yml`](https://github.com/doobidoo/mcp-memory-service/blob/main/tools/docker/docker-compose.http.yml)): Exposes port `${HTTP_PORT:-8000}` (line 12) and sets `MCP_MODE=http` with REST-specific variables (`MCP_HTTP_PORT`, `MCP_API_KEY`). Includes Dockerfile health checks (lines 61-66) for load balancer compatibility.

**Standalone Mode** ([`tools/docker/docker-compose.standalone.yml`](https://github.com/doobidoo/mcp-memory-service/blob/main/tools/docker/docker-compose.standalone.yml)): Mounts the source tree (`.:/app`) for development and sets `MCP_STANDALONE_MODE=1` (line 22), allowing the service to run as a headless server without MCP client attachment.

All Dockerfiles utilize a unified entrypoint system ([`docker-entrypoint-unified.sh`](https://github.com/doobidoo/mcp-memory-service/blob/main/docker-entrypoint-unified.sh)), though Glama notes a default command for standalone compatibility (line 66 in `tools/docker/Dockerfile.glama`) with an empty `CMD []` instruction, deferring runtime decisions to environment variables.

## Practical Deployment Examples

### Running Slim Mode (Minimal CPU-Only)

```bash
docker build -f tools/docker/Dockerfile.slim -t mcp-memory-service:slim .
docker run -p 8000:8000 \
  -e MCP_MEMORY_STORAGE_BACKEND=sqlite_vec \
  -e MCP_MEMORY_USE_ONNX=1 \
  -v $(pwd)/data:/app/sqlite_db \
  mcp-memory-service:slim

```

### Deploying Glama Mode (Cloud-Optimized)

```bash
docker build -f tools/docker/Dockerfile.glama -t mcp-memory-service:glama .
docker run -p 8000:8000 \
  -e MCP_MEMORY_CHROMA_PATH=/app/chroma_db \
  -v $(pwd)/chroma_db:/app/chroma_db \
  mcp-memory-service:glama

```

### Starting Standalone Mode (Independent Server)

```bash
docker compose -f tools/docker/docker-compose.standalone.yml up -d

```

Or via direct Docker command using the primary Dockerfile:

```bash
docker build -f tools/docker/Dockerfile -t mcp-memory-service .
docker run -p 8000:8000 \
  -e MCP_STANDALONE_MODE=1 \
  -v $(pwd)/data:/app/data \
  mcp-memory-service

```

## Summary

- **Slim** mode prioritizes minimal image size using `python:3.10-slim`, ONNX-only embeddings, and fixed dependencies via `tools/docker/Dockerfile.slim`.
- **Glama** mode adds native compilation tools (`gcc`, `g++`, `build-essential`) and cloud-specific health checks in `tools/docker/Dockerfile.glama` for platform compatibility.
- **Standalone** mode operates through `MCP_STANDALONE_MODE=1` environment variable in [`tools/docker/docker-compose.standalone.yml`](https://github.com/doobidoo/mcp-memory-service/blob/main/tools/docker/docker-compose.standalone.yml), enabling autonomous operation without MCP client connections.
- The primary `tools/docker/Dockerfile` supports flexible PyTorch installation via `FORCE_CPU_PYTORCH` build arguments for customizable Standalone deployments.
- All modes utilize the unified entrypoint script ([`docker-entrypoint-unified.sh`](https://github.com/doobidoo/mcp-memory-service/blob/main/docker-entrypoint-unified.sh)) but differ in base image versions, dependency resolution strategies, and runtime environment variables.

## Frequently Asked Questions

### What is the smallest Docker image size available for MCP Memory Service?

The **Slim** mode produces the smallest footprint by using `python:3.10-slim` as the base image, stripping PyTorch entirely, and relying exclusively on ONNX runtime for embeddings. This configuration, defined in `tools/docker/Dockerfile.slim`, is optimal for CI/CD pipelines and resource-constrained edge devices where every megabyte matters.

### Can I use PyTorch instead of ONNX in the Glama deployment mode?

Yes. Unlike Slim mode which forces `MCP_MEMORY_USE_ONNX=1`, the Glama Dockerfile (`tools/docker/Dockerfile.glama`) does not enforce ONNX usage. It sets `PYTORCH_ENABLE_MPS_FALLBACK=1` to support Apple Silicon GPU fallback and installs dependencies via editable mode, allowing the full PyTorch stack to resolve from [`pyproject.toml`](https://github.com/doobidoo/mcp-memory-service/blob/main/pyproject.toml) without CPU-only restrictions.

### How does Standalone mode keep the container running without an MCP client?

Standalone mode sets the environment variable `MCP_STANDALONE_MODE=1`, which the startup orchestrator detects in [`src/mcp_memory_service/utils/startup_orchestrator.py`](https://github.com/doobidoo/mcp-memory-service/blob/main/src/mcp_memory_service/utils/startup_orchestrator.py) (lines 153-166). This flag instructs the service to bypass the standard MCP client connection requirement, keeping the main process alive for HTTP API access or background processing even when no client maintains a `stdin` pipe connection.

### Which Dockerfile should I use for production HTTP API deployments?

Use the **primary** `tools/docker/Dockerfile` with [`tools/docker/docker-compose.http.yml`](https://github.com/doobidoo/mcp-memory-service/blob/main/tools/docker/docker-compose.http.yml) for production HTTP deployments. This combination exposes port 8000, configures `MCP_MODE=http`, and supports optional CPU-only PyTorch via `FORCE_CPU_PYTORCH` build arguments. The HTTP compose file includes health checks (lines 61-66) essential for production load balancer integration.