# Auto Model Selection and Fallback System Architecture in Summarize

> Explore the auto model selection and fallback system architecture in steipete/summarize. Discover how it ranks LLMs by transport type, filters, and prepares for sequential execution.

- Repository: [Peter Steinberger/summarize](https://github.com/steipete/summarize)
- Tags: architecture
- Published: 2026-02-19

---

**The auto model selection and fallback system in steipete/summarize is a deterministic pipeline that ranks candidate LLMs by transport type (CLI → native → OpenRouter), filters by token limits and API keys, and returns an ordered array of `AutoModelAttempt` objects for sequential execution.**

When you configure `model: "auto"` in your Summarize configuration, the runtime must intelligently choose between dozens of potential providers without manual intervention. This system lives primarily in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts) and orchestrates a five-stage selection process that balances cost, availability, and capability requirements.

## Core Architecture of the Auto Model Selection System

### Data Structures and Type Definitions

The system relies on three primary TypeScript interfaces defined across [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts) and [`src/config.ts`](https://github.com/steipete/summarize/blob/main/src/config.ts):

- **`AutoSelectionInput`** (lines 16-30 in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts)): Encapsulates the decision context including input kind (text, website, video), token counts, environment variables, user configuration, and the LiteLLM pricing catalog.

- **`AutoModelAttempt`** (lines 32-51 in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts)): Represents a concrete execution candidate with properties for `transport` (cli, native, openrouter), resolved model IDs, required environment variable names, cost estimates, and human-readable debug strings.

- **`AutoRule`** (lines 23-48 in [`src/config.ts`](https://github.com/steipete/summarize/blob/main/src/config.ts)): Defines user-configurable selection logic with `when` conditions (input kind matching), flat `candidates` arrays, or token-based `bands` with `min`/`max` thresholds.

### The Five-Stage Selection Pipeline

The `buildAutoModelAttempts` function (lines 508-684 in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts)) implements a pure-function pipeline with zero side effects:

1. **Rule resolution** – Determine base candidates from built-in or user-defined rules.
2. **CLI injection** – Prepend CLI-only providers (Claude, Gemini, Codex, Agent) based on fallback configuration.
3. **OpenRouter expansion** – Add OpenRouter transport alternatives for native providers when API keys are present.
4. **Filtering and validation** – Remove candidates exceeding token limits, lacking credentials, or unsuitable for video understanding.
5. **Deduplication and ordering** – Ensure stable, deterministic output ordered by transport precedence.

## Step-by-Step Candidate Resolution Process

### Stage 1: Rule-Based Candidate Selection

The `resolveRuleCandidates` function (lines 373-418) evaluates the `model.rules` array from user configuration or falls back to `DEFAULT_RULES` (lines 47-98).

For each rule, the system checks the `when` property against the current input `kind`. If the rule specifies flat `candidates`, those strings are returned immediately. Otherwise, the function iterates through token-based `bands` (arrays with `token.min` and `token.max` properties), returning the first band where `promptTokens` falls within range. If no bands match, the last rule's candidates serve as the ultimate fallback.

### Stage 2: CLI Fallback Candidate Injection

The `prependCliCandidates` function (lines 20-73) handles the optional insertion of CLI-only providers before native API candidates.

First, `resolveCliAutoFallbackConfig` (lines 300-335) determines activation status by checking:
- `autoFallback.enabled` is true
- Either implicit auto selection is active or `allowAutoCliFallback` is true
- Either `onlyWhenNoApiKeys` is false or `hasAnyApiKeysConfigured` (lines 45-58) returns false

When enabled, the system builds an ordered provider list from `autoFallback.order` or the `DEFAULT_AUTO_CLI_ORDER` constant (`["claude","gemini","codex","agent"]`). If `prioritizeCliProvider` references a previous successful CLI execution (stored in [`src/run/cli-fallback-state.ts`](https://github.com/steipete/summarize/blob/main/src/run/cli-fallback-state.ts)), that provider moves to the front.

Each CLI candidate takes the form `cli/<provider>/<model>`, with default models defined in `DEFAULT_CLI_MODELS` (e.g., `claude: "sonnet"`, `codex: "gpt-5.2"`).

### Stage 3: OpenRouter Fallback Expansion

When native providers are selected and `OPENROUTER_API_KEY` is present, the system attempts to add OpenRouter transport alternatives via `resolveOpenRouterModelIdForNative` (lines 111-140).

The function maintains a process-wide cache (`cachedOpenRouterIndex`) built from `piAi.getModels("openrouter")` on first invocation. This index maps three lookup keys for each OpenRouter model: the canonical provider-model ID, the slug, and a punctuation-insensitive normalized slug.

Resolution follows a strict precedence: exact canonical match, then unique slug match, then normalized slug match. When a unique OpenRouter equivalent is found, the system creates a second `AutoModelAttempt` with `transport: "openrouter"` and `forceOpenRouter: true`, preserving the original native attempt for primary execution.

### Stage 4: Filtering and Validation

During the assembly loop in `buildAutoModelAttempts` (lines 558-684), each candidate undergoes rigorous validation:

- **Video capability check**: If `requiresVideoUnderstanding` is true, native candidates without video support are skipped (lines 587-594).
- **Token limit enforcement**: Native attempts are rejected when `promptTokens` exceed the provider's maximum input tokens as defined in the LiteLLM catalog.
- **Credential verification**: The `requiredEnvForCandidate` function determines the required environment variable name (e.g., `OPENAI_API_KEY` for native OpenAI, `CLI_CLAUDE` for CLI Claude, `OPENROUTER_API_KEY` for OpenRouter). The `envHasKey` check ensures the variable exists and is non-empty.

### Stage 5: Deduplication and Ordering

The final stage creates a composite deduplication key combining `transport`, `forceOpenRouter`, `userModelId`, and `providers` array. This ensures that identical candidates from different resolution paths (e.g., a native provider and its OpenRouter equivalent) remain distinct when they offer different transport mechanisms, but prevents true duplicates.

The output array maintains strict precedence: CLI candidates first (fastest to fail, no API costs), then native providers (direct API access), then OpenRouter fallbacks (broader compatibility). This ordering minimizes latency and cost while maximizing success probability.

## Configuration and Environment Integration

The auto model selection system exposes several configuration touch-points in [`src/config.ts`](https://github.com/steipete/summarize/blob/main/src/config.ts):

- **`model.rules[]`**: User-defined array of `AutoRule` objects that override the built-in `DEFAULT_RULES` (lines 47-98 in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts)).
- **`cli.autoFallback`**: Structured configuration object with `enabled`, `onlyWhenNoApiKeys`, `order`, and `prioritizeCliProvider` properties.
- **Environment variables**: The system checks for `OPENAI_API_KEY`, `GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, `OPENROUTER_API_KEY`, and CLI-specific variables like `CLI_CLAUDE` or `CLI_GEMINI`.

The `hasAnyApiKeysConfigured` function (lines 45-58) performs a comprehensive scan of the environment to determine whether any native API keys are present, which influences the CLI fallback logic when `onlyWhenNoApiKeys` is enabled.

## Code Example: Building Auto Model Attempts

The following TypeScript example demonstrates how to invoke the selector programmatically:

```typescript
import { buildAutoModelAttempts } from "./model-auto.js";

const input = {
  kind: "website",
  promptTokens: 12_000,
  desiredOutputTokens: 2_000,
  requiresVideoUnderstanding: false,
  env: { 
    OPENAI_API_KEY: "sk-...", 
    GEMINI_API_KEY: "..." 
  },
  config: {
    model: { mode: "auto" },
    cli: { 
      autoFallback: { 
        enabled: true, 
        onlyWhenNoApiKeys: false 
      } 
    },
  },
  catalog: null,
  openrouterProvidersFromEnv: ["openai"],
  cliAvailability: { 
    claude: true, 
    codex: true, 
    gemini: true, 
    agent: false 
  },
  isImplicitAutoSelection: true,
  allowAutoCliFallback: true,
  lastSuccessfulCliProvider: "gemini",
};

const attempts = buildAutoModelAttempts(input);
console.log(attempts);

```

The resulting array contains `AutoModelAttempt` objects ordered by transport priority. Each attempt includes the resolved model ID, required environment variable, cost estimate, and debug metadata for observability.

## Summary

The auto model selection and fallback system in steipete/summarize implements a deterministic, configuration-driven pipeline for intelligent LLM routing:

- **Rule-based selection** evaluates input kind and token counts against user-defined or built-in `AutoRule` configurations to generate initial candidates.
- **CLI augmentation** optionally prepends command-line providers (Claude, Gemini, Codex, Agent) when auto-fallback is enabled and API availability conditions are met.
- **OpenRouter expansion** creates transport-layer fallbacks for native providers by resolving canonical model IDs against the OpenRouter catalog.
- **Validation filtering** removes candidates exceeding token limits, lacking required credentials, or incompatible with video-understanding requirements.
- **Deduplication and ordering** produces a stable array prioritizing CLI attempts first, native APIs second, and OpenRouter fallbacks last.

All logic resides in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts) and integrates with configuration definitions in [`src/config.ts`](https://github.com/steipete/summarize/blob/main/src/config.ts), providing a transparent, testable mechanism for graceful degradation across multiple LLM providers.

## Frequently Asked Questions

### How does the system decide when to use CLI fallback providers?

The `resolveCliAutoFallbackConfig` function in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts) (lines 300-335) activates CLI fallback only when three conditions align: `autoFallback.enabled` is true in the configuration, the current execution context permits auto-selection or explicit CLI fallback allowance, and either `onlyWhenNoApiKeys` is disabled or no native API keys are detected in the environment via `hasAnyApiKeysConfigured` (lines 45-58). This ensures CLI providers serve as cost-effective alternatives only when appropriate.

### What determines the order of providers in the fallback chain?

The system establishes provider precedence through multiple mechanisms. First, `resolveRuleCandidates` applies user-defined or built-in `AutoRule` configurations that specify explicit candidate arrays or token-based bands. Next, `prependCliCandidates` injects CLI providers according to the `autoFallback.order` array (defaulting to `["claude","gemini","codex","agent"]`), optionally promoting the `lastSuccessfulCliProvider` to the front based on state from [`src/run/cli-fallback-state.ts`](https://github.com/steipete/summarize/blob/main/src/run/cli-fallback-state.ts). Finally, `buildAutoModelAttempts` assembles the definitive sequence: CLI attempts first, native API attempts second, and OpenRouter fallbacks last.

### How does the system handle video understanding requirements?

When `requiresVideoUnderstanding` is set to true (typically triggered by `media.videoMode: "understand"` in the configuration), the `buildAutoModelAttempts` function (lines 587-594 in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts)) filters the candidate list to exclude native providers that lack video comprehension capabilities. This validation occurs before transport-specific attempts are created, ensuring that only multimodal-capable models remain in the fallback chain for video processing tasks.

### Can I customize which models are selected for specific token ranges?

Yes, through the `model.rules` configuration array in [`src/config.ts`](https://github.com/steipete/summarize/blob/main/src/config.ts) (lines 23-48). Each `AutoRule` object can specify `when` conditions to match specific input kinds (text, website, video), and either a flat `candidates` array or a `bands` array containing token ranges with `min` and `max` properties. The `resolveRuleCandidates` function (lines 373-418 in [`src/model-auto.ts`](https://github.com/steipete/summarize/blob/main/src/model-auto.ts)) evaluates these rules sequentially, returning the first matching candidate set or falling back to the built-in `DEFAULT_RULES` (lines 47-98) if no user rules match.