how-to-guide

How to Deploy vLLM GPU Pods with the Pods CLI for pi-ai

February 16, 2026 badlogic/pi-mono ↗

Deploy vLLM GPU pods by registering remote hosts via pi pods setup, activating a pod with pi pods active, and launching models using pi start with automatic SSH provisioning and local state management in ~/.pi/pods.json.

The pi CLI from the badlogic/pi-mono repository provides a complete workflow for deploying and managing vLLM GPU pods. This guide explains how to use the pods sub-command to provision remote GPU instances, configure the local environment, and serve large language models with optimized inference settings.

Prerequisites and Environment Setup

Before registering GPU pods, install the global CLI and configure authentication tokens. The pi tool requires HF_TOKEN for downloading models from Hugging Face and PI_API_KEY for securing the vLLM API endpoints.


# Install the CLI globally

npm install -g @mariozechner/pi

# Set required environment variables

export HF_TOKEN=your_hf_token
export PI_API_KEY=any_secret_string

Registering a GPU Pod with pi pods setup

The pi pods setup command validates SSH connectivity, transfers provisioning scripts, and writes the pod definition to ~/.pi/pods.json. In packages/pods/src/commands/pods.ts, the setupPod function orchestrates this process by executing scpFile to transfer scripts/pod_setup.sh and sshExecStream to run remote installation commands.

The provisioning script handles system package installation, Python virtual environment creation, and vLLM installation based on your selected build type (release, nightly, or gpt-oss).


# Register a DataCrunch GPU pod with NFS mount

pi pods setup dc1 "ssh [email protected]" \
  --mount "sudo mount -t nfs -o nconnect=16 nfs.fin-02.datacrunch.io:/hf-models /mnt/hf-models" \
  --vllm release

Managing Pod State and Configuration

Pod state persists locally in ~/.pi/pods.json, managed by packages/pods/src/config.ts. The getActivePod() function reads the active field to determine which pod receives model commands by default.

Switch the active pod using pi pods active <name>, which calls switchActivePod in packages/pods/src/commands/pods.ts to update the JSON configuration.


# List all registered pods

pi pods

# Set the active pod

pi pods active dc1

# Verify active status (marked with *)

pi pods

# Output: "* dc1 - 4x H100 - vLLM: release - ssh [email protected]"

Deploying and Running vLLM Models

With an active pod configured, use pi start to launch vLLM instances. The command builds launch flags based on memory allocation (--memory), context length (--context), and GPU configuration, then executes remotely via SSH.

Model management logic resides in packages/pods/src/commands/models.ts, which handles starting and stopping vLLM processes and exposes OpenAI-compatible HTTP endpoints.


# Start a model on the active pod

pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen \
  --memory 90% --context 64k

# Override the active pod for a single command

pi start openai/gpt-oss-20b --name gpt20 --pod dc1

# Check model logs

pi logs qwen

# Stop the model

pi stop qwen

Interactive Debugging and Agent Access

Access pods directly for debugging or run inference through the built-in agent interface. The pi shell command opens an interactive SSH session to the active pod or a specified pod name.


# Open shell on active pod

pi shell

# Open shell on specific pod

pi shell runpod

# Run a single query through the agent

pi agent qwen "Write a Python function that computes factorials."

Summary

Register pods using pi pods setup to validate SSH access, run scripts/pod_setup.sh, and store configurations in ~/.pi/pods.json via packages/pods/src/config.ts.
Activate pods with pi pods active <name> to set the default target for model commands, implemented in packages/pods/src/commands/pods.ts.
Deploy models using pi start with memory and context flags; the CLI builds vLLM command lines in packages/pods/src/commands/models.ts and executes them over SSH.
Override context for individual commands using the --pod flag, which bypasses the active pod selection in packages/pods/src/cli.ts.

Frequently Asked Questions

How does the pi CLI authenticate with remote GPU pods?

The CLI uses standard SSH key-based authentication configured on your local machine. When running pi pods setup, you provide an SSH connection string (e.g., "ssh [email protected]") which the CLI stores in ~/.pi/pods.json. Subsequent commands in packages/pods/src/ssh.ts use this string to execute remote commands via sshExecStream and transfer files via scpFile without requiring password prompts if your SSH keys are properly configured.

What vLLM build options are available when setting up a pod?

The pi pods setup command accepts a --vllm flag with three valid options: release (stable builds), nightly (development builds with latest features), and gpt-oss (optimized for GPT-OSS models). This selection is passed to the scripts/pod_setup.sh provisioning script, which installs the appropriate vLLM version into the remote host's Python virtual environment during the initial setup phase.

Can I run multiple models simultaneously on different pods?

Yes. While the CLI maintains a single active pod in ~/.pi/pods.json for convenience, you can override this default for any model command using the --pod <name> flag. As implemented in packages/pods/src/cli.ts, this flag bypasses the getActivePod() lookup and routes the command to the specified pod, allowing you to start models on different GPU instances simultaneously and manage them independently using pi list, pi logs, and pi stop with the appropriate --pod overrides.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how badlogic/pi-mono works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →