# How to Deploy vLLM GPU Pods with the Pods CLI for pi-ai

> Deploy vLLM GPU pods with the pods CLI for pi-ai. Effortlessly set up remote hosts, activate pods, and launch models for seamless AI deployment. Streamline your workflow today.

- Repository: [Mario Zechner/pi-mono](https://github.com/badlogic/pi-mono)
- Tags: how-to-guide
- Published: 2026-02-16

---

**Deploy vLLM GPU pods by registering remote hosts via `pi pods setup`, activating a pod with `pi pods active`, and launching models using `pi start` with automatic SSH provisioning and local state management in `~/.pi/pods.json`.**

The `pi` CLI from the [badlogic/pi-mono](https://github.com/badlogic/pi-mono) repository provides a complete workflow for deploying and managing vLLM GPU pods. This guide explains how to use the `pods` sub-command to provision remote GPU instances, configure the local environment, and serve large language models with optimized inference settings.

## Prerequisites and Environment Setup

Before registering GPU pods, install the global CLI and configure authentication tokens. The `pi` tool requires `HF_TOKEN` for downloading models from Hugging Face and `PI_API_KEY` for securing the vLLM API endpoints.

```bash

# Install the CLI globally

npm install -g @mariozechner/pi

# Set required environment variables

export HF_TOKEN=your_hf_token
export PI_API_KEY=any_secret_string

```

## Registering a GPU Pod with pi pods setup

The `pi pods setup` command validates SSH connectivity, transfers provisioning scripts, and writes the pod definition to `~/.pi/pods.json`. In [`packages/pods/src/commands/pods.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/commands/pods.ts), the `setupPod` function orchestrates this process by executing `scpFile` to transfer [`scripts/pod_setup.sh`](https://github.com/badlogic/pi-mono/blob/main/scripts/pod_setup.sh) and `sshExecStream` to run remote installation commands.

The provisioning script handles system package installation, Python virtual environment creation, and vLLM installation based on your selected build type (`release`, `nightly`, or `gpt-oss`).

```bash

# Register a DataCrunch GPU pod with NFS mount

pi pods setup dc1 "ssh root@12.34.56.78" \
  --mount "sudo mount -t nfs -o nconnect=16 nfs.fin-02.datacrunch.io:/hf-models /mnt/hf-models" \
  --vllm release

```

## Managing Pod State and Configuration

Pod state persists locally in `~/.pi/pods.json`, managed by [`packages/pods/src/config.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/config.ts). The `getActivePod()` function reads the `active` field to determine which pod receives model commands by default.

Switch the active pod using `pi pods active <name>`, which calls `switchActivePod` in [`packages/pods/src/commands/pods.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/commands/pods.ts) to update the JSON configuration.

```bash

# List all registered pods

pi pods

# Set the active pod

pi pods active dc1

# Verify active status (marked with *)

pi pods

# Output: "* dc1 - 4x H100 - vLLM: release - ssh root@12.34.56.78"

```

## Deploying and Running vLLM Models

With an active pod configured, use `pi start` to launch vLLM instances. The command builds launch flags based on memory allocation (`--memory`), context length (`--context`), and GPU configuration, then executes remotely via SSH.

Model management logic resides in [`packages/pods/src/commands/models.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/commands/models.ts), which handles starting and stopping vLLM processes and exposes OpenAI-compatible HTTP endpoints.

```bash

# Start a model on the active pod

pi start Qwen/Qwen2.5-Coder-32B-Instruct --name qwen \
  --memory 90% --context 64k

# Override the active pod for a single command

pi start openai/gpt-oss-20b --name gpt20 --pod dc1

# Check model logs

pi logs qwen

# Stop the model

pi stop qwen

```

## Interactive Debugging and Agent Access

Access pods directly for debugging or run inference through the built-in agent interface. The `pi shell` command opens an interactive SSH session to the active pod or a specified pod name.

```bash

# Open shell on active pod

pi shell

# Open shell on specific pod

pi shell runpod

# Run a single query through the agent

pi agent qwen "Write a Python function that computes factorials."

```

## Summary

- **Register pods** using `pi pods setup` to validate SSH access, run [`scripts/pod_setup.sh`](https://github.com/badlogic/pi-mono/blob/main/scripts/pod_setup.sh), and store configurations in `~/.pi/pods.json` via [`packages/pods/src/config.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/config.ts).
- **Activate pods** with `pi pods active <name>` to set the default target for model commands, implemented in [`packages/pods/src/commands/pods.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/commands/pods.ts).
- **Deploy models** using `pi start` with memory and context flags; the CLI builds vLLM command lines in [`packages/pods/src/commands/models.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/commands/models.ts) and executes them over SSH.
- **Override context** for individual commands using the `--pod` flag, which bypasses the active pod selection in [`packages/pods/src/cli.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/cli.ts).

## Frequently Asked Questions

### How does the pi CLI authenticate with remote GPU pods?

The CLI uses standard SSH key-based authentication configured on your local machine. When running `pi pods setup`, you provide an SSH connection string (e.g., `"ssh root@12.34.56.78"`) which the CLI stores in `~/.pi/pods.json`. Subsequent commands in [`packages/pods/src/ssh.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/ssh.ts) use this string to execute remote commands via `sshExecStream` and transfer files via `scpFile` without requiring password prompts if your SSH keys are properly configured.

### What vLLM build options are available when setting up a pod?

The `pi pods setup` command accepts a `--vllm` flag with three valid options: `release` (stable builds), `nightly` (development builds with latest features), and `gpt-oss` (optimized for GPT-OSS models). This selection is passed to the [`scripts/pod_setup.sh`](https://github.com/badlogic/pi-mono/blob/main/scripts/pod_setup.sh) provisioning script, which installs the appropriate vLLM version into the remote host's Python virtual environment during the initial setup phase.

### Can I run multiple models simultaneously on different pods?

Yes. While the CLI maintains a single `active` pod in `~/.pi/pods.json` for convenience, you can override this default for any model command using the `--pod <name>` flag. As implemented in [`packages/pods/src/cli.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/pods/src/cli.ts), this flag bypasses the `getActivePod()` lookup and routes the command to the specified pod, allowing you to start models on different GPU instances simultaneously and manage them independently using `pi list`, `pi logs`, and `pi stop` with the appropriate `--pod` overrides.