How to Deploy vLLM GPU Pods with the Pods CLI for pi-ai
Deploy vLLM GPU pods by registering remote hosts via pi pods setup, activating a pod with pi pods active, and launching models using pi start with automatic SSH provisioning and local state management in ~/.pi/pods.json.
The pi CLI from the badlogic/pi-mono repository provides a complete workflow for deploying and managing vLLM GPU pods. This guide explains how to use the pods sub-command to provision remote GPU instances, configure the local environment, and serve large language models with optimized inference settings.
Prerequisites and Environment Setup
Before registering GPU pods, install the global CLI and configure authentication tokens. The pi tool requires HF_TOKEN for downloading models from Hugging Face and PI_API_KEY for securing the vLLM API endpoints.
# Install the CLI globally
npm install -g @mariozechner/pi
# Set required environment variables
export HF_TOKEN=your_hf_token
export PI_API_KEY=any_secret_string
Registering a GPU Pod with pi pods setup
The pi pods setup command validates SSH connectivity, transfers provisioning scripts, and writes the pod definition to ~/.pi/pods.json. In packages/pods/src/commands/pods.ts, the setupPod function orchestrates this process by executing scpFile to transfer scripts/pod_setup.sh and sshExecStream to run remote installation commands.
The provisioning script handles system package installation, Python virtual environment creation, and vLLM installation based on your selected build type (release, nightly, or gpt-oss).
# Register a DataCrunch GPU pod with NFS mount
pi pods setup dc1 "ssh [email protected]" \
--mount "sudo mount -t nfs -o nconnect=16 nfs.fin-02.datacrunch.io:/hf-models /mnt/hf-models" \
--vllm release
Managing Pod State and Configuration
Pod state persists locally in ~/.pi/pods.json, managed by packages/pods/src/config.ts. The getActivePod() function reads the active field to determine which pod receives model commands by default.
Switch the active pod using pi pods active <name>, which calls switchActivePod in packages/pods/src/commands/pods.ts to update the JSON configuration.
# List all registered pods
pi pods
# Set the active pod
pi pods active dc1
# Verify active status (marked with *)
pi pods
# Output: "* dc1 - 4x H100 - vLLM: release - ssh [email protected]"
Deploying and Running vLLM Models
With an active pod configured, use pi start to launch vLLM instances. The command builds launch flags based on memory allocation (--memory), context length (--context), and GPU configuration, then executes remotely via SSH.
Model management logic resides in packages/pods/src/commands/models.ts, which handles starting and stopping vLLM processes and exposes OpenAI-compatible HTTP endpoints.
# Start a model on the active pod
pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen \
--memory 90% --context 64k
# Override the active pod for a single command
pi start openai/gpt-oss-20b --name gpt20 --pod dc1
# Check model logs
pi logs qwen
# Stop the model
pi stop qwen
Interactive Debugging and Agent Access
Access pods directly for debugging or run inference through the built-in agent interface. The pi shell command opens an interactive SSH session to the active pod or a specified pod name.
# Open shell on active pod
pi shell
# Open shell on specific pod
pi shell runpod
# Run a single query through the agent
pi agent qwen "Write a Python function that computes factorials."
Summary
- Register pods using
pi pods setupto validate SSH access, runscripts/pod_setup.sh, and store configurations in~/.pi/pods.jsonviapackages/pods/src/config.ts. - Activate pods with
pi pods active <name>to set the default target for model commands, implemented inpackages/pods/src/commands/pods.ts. - Deploy models using
pi startwith memory and context flags; the CLI builds vLLM command lines inpackages/pods/src/commands/models.tsand executes them over SSH. - Override context for individual commands using the
--podflag, which bypasses the active pod selection inpackages/pods/src/cli.ts.
Frequently Asked Questions
How does the pi CLI authenticate with remote GPU pods?
The CLI uses standard SSH key-based authentication configured on your local machine. When running pi pods setup, you provide an SSH connection string (e.g., "ssh [email protected]") which the CLI stores in ~/.pi/pods.json. Subsequent commands in packages/pods/src/ssh.ts use this string to execute remote commands via sshExecStream and transfer files via scpFile without requiring password prompts if your SSH keys are properly configured.
What vLLM build options are available when setting up a pod?
The pi pods setup command accepts a --vllm flag with three valid options: release (stable builds), nightly (development builds with latest features), and gpt-oss (optimized for GPT-OSS models). This selection is passed to the scripts/pod_setup.sh provisioning script, which installs the appropriate vLLM version into the remote host's Python virtual environment during the initial setup phase.
Can I run multiple models simultaneously on different pods?
Yes. While the CLI maintains a single active pod in ~/.pi/pods.json for convenience, you can override this default for any model command using the --pod <name> flag. As implemented in packages/pods/src/cli.ts, this flag bypasses the getActivePod() lookup and routes the command to the specified pod, allowing you to start models on different GPU instances simultaneously and manage them independently using pi list, pi logs, and pi stop with the appropriate --pod overrides.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →