Auto Model Selection and Fallback System Architecture in Summarize

The auto model selection and fallback system in steipete/summarize is a deterministic pipeline that ranks candidate LLMs by transport type (CLI → native → OpenRouter), filters by token limits and API keys, and returns an ordered array of AutoModelAttempt objects for sequential execution.

When you configure model: "auto" in your Summarize configuration, the runtime must intelligently choose between dozens of potential providers without manual intervention. This system lives primarily in src/model-auto.ts and orchestrates a five-stage selection process that balances cost, availability, and capability requirements.

Core Architecture of the Auto Model Selection System

Data Structures and Type Definitions

The system relies on three primary TypeScript interfaces defined across src/model-auto.ts and src/config.ts:

  • AutoSelectionInput (lines 16-30 in src/model-auto.ts): Encapsulates the decision context including input kind (text, website, video), token counts, environment variables, user configuration, and the LiteLLM pricing catalog.

  • AutoModelAttempt (lines 32-51 in src/model-auto.ts): Represents a concrete execution candidate with properties for transport (cli, native, openrouter), resolved model IDs, required environment variable names, cost estimates, and human-readable debug strings.

  • AutoRule (lines 23-48 in src/config.ts): Defines user-configurable selection logic with when conditions (input kind matching), flat candidates arrays, or token-based bands with min/max thresholds.

The Five-Stage Selection Pipeline

The buildAutoModelAttempts function (lines 508-684 in src/model-auto.ts) implements a pure-function pipeline with zero side effects:

  1. Rule resolution – Determine base candidates from built-in or user-defined rules.
  2. CLI injection – Prepend CLI-only providers (Claude, Gemini, Codex, Agent) based on fallback configuration.
  3. OpenRouter expansion – Add OpenRouter transport alternatives for native providers when API keys are present.
  4. Filtering and validation – Remove candidates exceeding token limits, lacking credentials, or unsuitable for video understanding.
  5. Deduplication and ordering – Ensure stable, deterministic output ordered by transport precedence.

Step-by-Step Candidate Resolution Process

Stage 1: Rule-Based Candidate Selection

The resolveRuleCandidates function (lines 373-418) evaluates the model.rules array from user configuration or falls back to DEFAULT_RULES (lines 47-98).

For each rule, the system checks the when property against the current input kind. If the rule specifies flat candidates, those strings are returned immediately. Otherwise, the function iterates through token-based bands (arrays with token.min and token.max properties), returning the first band where promptTokens falls within range. If no bands match, the last rule's candidates serve as the ultimate fallback.

Stage 2: CLI Fallback Candidate Injection

The prependCliCandidates function (lines 20-73) handles the optional insertion of CLI-only providers before native API candidates.

First, resolveCliAutoFallbackConfig (lines 300-335) determines activation status by checking:

  • autoFallback.enabled is true
  • Either implicit auto selection is active or allowAutoCliFallback is true
  • Either onlyWhenNoApiKeys is false or hasAnyApiKeysConfigured (lines 45-58) returns false

When enabled, the system builds an ordered provider list from autoFallback.order or the DEFAULT_AUTO_CLI_ORDER constant (["claude","gemini","codex","agent"]). If prioritizeCliProvider references a previous successful CLI execution (stored in src/run/cli-fallback-state.ts), that provider moves to the front.

Each CLI candidate takes the form cli/<provider>/<model>, with default models defined in DEFAULT_CLI_MODELS (e.g., claude: "sonnet", codex: "gpt-5.2").

Stage 3: OpenRouter Fallback Expansion

When native providers are selected and OPENROUTER_API_KEY is present, the system attempts to add OpenRouter transport alternatives via resolveOpenRouterModelIdForNative (lines 111-140).

The function maintains a process-wide cache (cachedOpenRouterIndex) built from piAi.getModels("openrouter") on first invocation. This index maps three lookup keys for each OpenRouter model: the canonical provider-model ID, the slug, and a punctuation-insensitive normalized slug.

Resolution follows a strict precedence: exact canonical match, then unique slug match, then normalized slug match. When a unique OpenRouter equivalent is found, the system creates a second AutoModelAttempt with transport: "openrouter" and forceOpenRouter: true, preserving the original native attempt for primary execution.

Stage 4: Filtering and Validation

During the assembly loop in buildAutoModelAttempts (lines 558-684), each candidate undergoes rigorous validation:

  • Video capability check: If requiresVideoUnderstanding is true, native candidates without video support are skipped (lines 587-594).
  • Token limit enforcement: Native attempts are rejected when promptTokens exceed the provider's maximum input tokens as defined in the LiteLLM catalog.
  • Credential verification: The requiredEnvForCandidate function determines the required environment variable name (e.g., OPENAI_API_KEY for native OpenAI, CLI_CLAUDE for CLI Claude, OPENROUTER_API_KEY for OpenRouter). The envHasKey check ensures the variable exists and is non-empty.

Stage 5: Deduplication and Ordering

The final stage creates a composite deduplication key combining transport, forceOpenRouter, userModelId, and providers array. This ensures that identical candidates from different resolution paths (e.g., a native provider and its OpenRouter equivalent) remain distinct when they offer different transport mechanisms, but prevents true duplicates.

The output array maintains strict precedence: CLI candidates first (fastest to fail, no API costs), then native providers (direct API access), then OpenRouter fallbacks (broader compatibility). This ordering minimizes latency and cost while maximizing success probability.

Configuration and Environment Integration

The auto model selection system exposes several configuration touch-points in src/config.ts:

  • model.rules[]: User-defined array of AutoRule objects that override the built-in DEFAULT_RULES (lines 47-98 in src/model-auto.ts).
  • cli.autoFallback: Structured configuration object with enabled, onlyWhenNoApiKeys, order, and prioritizeCliProvider properties.
  • Environment variables: The system checks for OPENAI_API_KEY, GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, and CLI-specific variables like CLI_CLAUDE or CLI_GEMINI.

The hasAnyApiKeysConfigured function (lines 45-58) performs a comprehensive scan of the environment to determine whether any native API keys are present, which influences the CLI fallback logic when onlyWhenNoApiKeys is enabled.

Code Example: Building Auto Model Attempts

The following TypeScript example demonstrates how to invoke the selector programmatically:

import { buildAutoModelAttempts } from "./model-auto.js";

const input = {
  kind: "website",
  promptTokens: 12_000,
  desiredOutputTokens: 2_000,
  requiresVideoUnderstanding: false,
  env: { 
    OPENAI_API_KEY: "sk-...", 
    GEMINI_API_KEY: "..." 
  },
  config: {
    model: { mode: "auto" },
    cli: { 
      autoFallback: { 
        enabled: true, 
        onlyWhenNoApiKeys: false 
      } 
    },
  },
  catalog: null,
  openrouterProvidersFromEnv: ["openai"],
  cliAvailability: { 
    claude: true, 
    codex: true, 
    gemini: true, 
    agent: false 
  },
  isImplicitAutoSelection: true,
  allowAutoCliFallback: true,
  lastSuccessfulCliProvider: "gemini",
};

const attempts = buildAutoModelAttempts(input);
console.log(attempts);

The resulting array contains AutoModelAttempt objects ordered by transport priority. Each attempt includes the resolved model ID, required environment variable, cost estimate, and debug metadata for observability.

Summary

The auto model selection and fallback system in steipete/summarize implements a deterministic, configuration-driven pipeline for intelligent LLM routing:

  • Rule-based selection evaluates input kind and token counts against user-defined or built-in AutoRule configurations to generate initial candidates.
  • CLI augmentation optionally prepends command-line providers (Claude, Gemini, Codex, Agent) when auto-fallback is enabled and API availability conditions are met.
  • OpenRouter expansion creates transport-layer fallbacks for native providers by resolving canonical model IDs against the OpenRouter catalog.
  • Validation filtering removes candidates exceeding token limits, lacking required credentials, or incompatible with video-understanding requirements.
  • Deduplication and ordering produces a stable array prioritizing CLI attempts first, native APIs second, and OpenRouter fallbacks last.

All logic resides in src/model-auto.ts and integrates with configuration definitions in src/config.ts, providing a transparent, testable mechanism for graceful degradation across multiple LLM providers.

Frequently Asked Questions

How does the system decide when to use CLI fallback providers?

The resolveCliAutoFallbackConfig function in src/model-auto.ts (lines 300-335) activates CLI fallback only when three conditions align: autoFallback.enabled is true in the configuration, the current execution context permits auto-selection or explicit CLI fallback allowance, and either onlyWhenNoApiKeys is disabled or no native API keys are detected in the environment via hasAnyApiKeysConfigured (lines 45-58). This ensures CLI providers serve as cost-effective alternatives only when appropriate.

What determines the order of providers in the fallback chain?

The system establishes provider precedence through multiple mechanisms. First, resolveRuleCandidates applies user-defined or built-in AutoRule configurations that specify explicit candidate arrays or token-based bands. Next, prependCliCandidates injects CLI providers according to the autoFallback.order array (defaulting to ["claude","gemini","codex","agent"]), optionally promoting the lastSuccessfulCliProvider to the front based on state from src/run/cli-fallback-state.ts. Finally, buildAutoModelAttempts assembles the definitive sequence: CLI attempts first, native API attempts second, and OpenRouter fallbacks last.

How does the system handle video understanding requirements?

When requiresVideoUnderstanding is set to true (typically triggered by media.videoMode: "understand" in the configuration), the buildAutoModelAttempts function (lines 587-594 in src/model-auto.ts) filters the candidate list to exclude native providers that lack video comprehension capabilities. This validation occurs before transport-specific attempts are created, ensuring that only multimodal-capable models remain in the fallback chain for video processing tasks.

Can I customize which models are selected for specific token ranges?

Yes, through the model.rules configuration array in src/config.ts (lines 23-48). Each AutoRule object can specify when conditions to match specific input kinds (text, website, video), and either a flat candidates array or a bands array containing token ranges with min and max properties. The resolveRuleCandidates function (lines 373-418 in src/model-auto.ts) evaluates these rules sequentially, returning the first matching candidate set or falling back to the built-in DEFAULT_RULES (lines 47-98) if no user rules match.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →