How to Configure Custom Models for Ollama or LiteLLM in pi-ai

You configure custom Ollama or LiteLLM models in pi-ai by creating a ~/.pi/agent/models.json file that defines provider types, base URLs, and model IDs, which the CustomProvidersStore automatically loads without requiring a restart.

The badlogic/pi-mono repository provides pi-ai with a flexible model discovery system that treats any OpenAI-compatible endpoint—including local Ollama servers, vLLM instances, or LiteLLM proxies—as a first-class provider. This configuration-driven approach lets you expose custom models in the UI by editing a single JSON file.

Configuration File Location and Structure

All custom provider definitions reside in ~/.pi/agent/models.json. This file is watched by the CustomProvidersStore located at packages/web-ui/src/storage/stores/custom-providers-store.ts, which means changes are applied immediately without restarting the application.

The top-level structure requires a providers object containing one or more provider configurations:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "placeholder",
      "models": []
    }
  }
}

Each provider entry requires four fields to satisfy the schema validated by the discovery layer:

  • baseUrl – The root endpoint of your LLM server (must include /v1 for OpenAI-compatible APIs)
  • api – The API format, typically "openai-completions" for Ollama, vLLM, and LiteLLM
  • apiKey – Required by the schema but ignored by Ollama; any string works
  • models – An array of model objects or simple ID strings

Minimal Ollama Configuration

For a standard Ollama installation running locally, you only need to specify the model IDs you want to expose. The discoverOllamaModels function in packages/web-ui/src/utils/model-discovery.ts automatically queries http://localhost:11434/api/tags to validate that the models support tool calling.

Create ~/.pi/agent/models.json with the following content:

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        { "id": "llama3.1:8b" },
        { "id": "qwen2.5-coder:7b" }
      ]
    }
  }
}

Once saved, open the /model view in the pi-ai interface. The ModelSelector component in packages/web-ui/src/dialogs/ModelSelector.ts merges these entries with built-in providers, making llama3.1:8b and qwen2.5-coder:7b selectable options.

Full Model Configuration Options

To override default discovery behavior or customize how a model appears in the UI, specify the complete model object structure. The discovery code in model-discovery.ts uses these values to construct the Model objects consumed by the agent.

{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        {
          "id": "llama3.1:8b",
          "name": "Llama 3.1 8B (local)",
          "reasoning": false,
          "input": ["text"],
          "contextWindow": 128000,
          "maxTokens": 32000,
          "cost": {
            "input": 0,
            "output": 0,
            "cacheRead": 0,
            "cacheWrite": 0
          }
        }
      ]
    }
  }
}

Key fields explained:

  • id – The exact model identifier passed to the server endpoint (e.g., llama3.1:8b)
  • name – The display label shown in the model selector; defaults to the ID if omitted
  • reasoning – Boolean flag enabling the thinking stream for models that support structured reasoning
  • input – Array of supported input types: ["text"] or ["text", "image"] for multimodal models
  • contextWindow – Maximum token context the model can process; discovery defaults to 128,000 for Ollama
  • maxTokens – Upper limit for generated tokens in a single response
  • cost – Pricing per million tokens; use 0 for all fields when running local models

Configuring LiteLLM and vLLM Endpoints

For LiteLLM proxies, vLLM servers, or other OpenAI-compatible local inference engines, use the same configuration pattern with adjusted provider keys and base URLs. The discoverVLLMModels function specifically maps max_model_len to the contextWindow and applies token caps based on the server's reported limits.

To connect a vLLM instance hosting a lightweight model like Gemini Flash Lite:

{
  "providers": {
    "vllm": {
      "baseUrl": "http://localhost:8000/v1",
      "api": "openai-completions",
      "apiKey": "vllm",
      "models": [
        { "id": "gemini-2.0-flash-lite" }
      ]
    }
  }
}

LiteLLM proxies expose an OpenAI-compatible API at a specific port, so you would configure:

{
  "providers": {
    "litellm": {
      "baseUrl": "http://localhost:4000/v1",
      "api": "openai-completions",
      "apiKey": "sk-your-lite-llm-key",
      "models": [
        { "id": "gpt-4o-mini" },
        { "id": "claude-3-haiku" }
      ]
    }
  }
}

The discovery layer treats all these identically, querying the /v1/models endpoint (or Ollama-specific endpoints) to validate availability and capabilities.

How Model Discovery Works

The discovery pipeline centers on packages/web-ui/src/utils/model-discovery.ts, which exports provider-specific discovery functions:

  • discoverOllamaModels(baseUrl) – Uses the ollama/browser SDK to fetch available tags, filters for models supporting the tools capability, and applies a 10× multiplier to calculate the contextWindow based on reported parameters
  • discoverVLLMModels(baseUrl) – Queries the vLLM OpenAI-compatible model endpoint and maps max_model_len to context window constraints
  • discoverLlamaCppModels and discoverLMStudioModels – Handle GGUF-based local servers

When CustomProvidersStore.reload() detects a file change in ~/.pi/agent/models.json, it triggers discoverModels() for each configured provider. The results merge with built-in models in the ModelSelector UI component at packages/web-ui/src/dialogs/ModelSelector.ts.

Step-by-Step Setup and Verification

1. Create the configuration directory and file:

mkdir -p ~/.pi/agent
cat > ~/.pi/agent/models.json <<'EOF'
{
  "providers": {
    "ollama": {
      "baseUrl": "http://localhost:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        { "id": "llama3.1:8b" },
        { "id": "qwen2.5-coder:7b" }
      ]
    }
  }
}
EOF

2. Verify discovery programmatically:

If you need to test the discovery logic outside the UI, import the discovery function directly from the source:

import { discoverOllamaModels } from "./packages/web-ui/src/utils/model-discovery.js";

(async () => {
  const models = await discoverOllamaModels("http://localhost:11434");
  console.log("Discovered models:", models.map(m => m.id));
})();

3. Force reload via the internal API:

While the file watcher handles most updates, you can manually trigger a reload in a Node environment:

import { CustomProvidersStore } from "./packages/web-ui/src/storage/stores/custom-providers-store.js";

await new CustomProvidersStore().reload();

4. Confirm in the UI:

Navigate to the /model tab. The selector should display your custom models alongside built-in options like GPT-4o and Claude. If a model is missing, check that the Ollama server is running and that the model ID matches exactly what ollama list returns.

Summary

  • Configuration location: Store all custom provider settings in ~/.pi/agent/models.json
  • Provider flexibility: Use "ollama", "vllm", "llama.cpp", "lmstudio", or custom keys for LiteLLM proxies
  • Required fields: Every provider needs baseUrl, api (typically "openai-completions"), apiKey (placeholder for local servers), and a models array
  • Automatic reload: The CustomProvidersStore watches the file and updates the ModelSelector UI without requiring an application restart
  • Deep configuration: Override contextWindow, maxTokens, reasoning flags, and cost structures to customize how local models behave compared to cloud APIs

Frequently Asked Questions

Does pi-ai support LiteLLM as a provider?

Yes, LiteLLM exposes an OpenAI-compatible API that works with the "openai-completions" API type. Configure it like any other provider by setting the baseUrl to your LiteLLM proxy address (e.g., http://localhost:4000/v1) and providing the actual API key issued by your LiteLLM instance. The discovery layer will query the LiteLLM /v1/models endpoint to populate the selector.

Why does Ollama require an API key field if it doesn't use authentication?

The models.json schema requires an apiKey field for all providers to maintain type consistency across cloud and local services. For Ollama, you can enter any placeholder string such as "ollama" or "none". The discoverOllamaModels function does not send this value to the local server, as Ollama does not implement API key validation.

How do I configure multimodal inputs for local vision models?

Set the input array to ["text", "image"] in your model configuration. When the ModelSelector in packages/web-ui/src/dialogs/ModelSelector.ts builds the model list, it checks this property to determine whether to enable image upload buttons in the chat interface. Ensure your local server (Ollama with LLaVA, or vLLM with vision support) actually serves the model with image processing capabilities.

What happens if the local server is offline when pi-ai starts?

The discoverOllamaModels and related functions catch connection errors gracefully. If the server is unreachable, the discovery layer returns an empty array for that provider, and the models will not appear in the selector. Once the server comes online, the file watcher detects no changes, so you must either touch the models.json file or use the CustomProvidersStore.reload() method to trigger a fresh discovery cycle.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →