Auto Model Selection and Fallback System Architecture in Summarize
The auto model selection and fallback system in steipete/summarize is a deterministic pipeline that ranks candidate LLMs by transport type (CLI → native → OpenRouter), filters by token limits and API keys, and returns an ordered array of AutoModelAttempt objects for sequential execution.
When you configure model: "auto" in your Summarize configuration, the runtime must intelligently choose between dozens of potential providers without manual intervention. This system lives primarily in src/model-auto.ts and orchestrates a five-stage selection process that balances cost, availability, and capability requirements.
Core Architecture of the Auto Model Selection System
Data Structures and Type Definitions
The system relies on three primary TypeScript interfaces defined across src/model-auto.ts and src/config.ts:
-
AutoSelectionInput(lines 16-30 insrc/model-auto.ts): Encapsulates the decision context including input kind (text, website, video), token counts, environment variables, user configuration, and the LiteLLM pricing catalog. -
AutoModelAttempt(lines 32-51 insrc/model-auto.ts): Represents a concrete execution candidate with properties fortransport(cli, native, openrouter), resolved model IDs, required environment variable names, cost estimates, and human-readable debug strings. -
AutoRule(lines 23-48 insrc/config.ts): Defines user-configurable selection logic withwhenconditions (input kind matching), flatcandidatesarrays, or token-basedbandswithmin/maxthresholds.
The Five-Stage Selection Pipeline
The buildAutoModelAttempts function (lines 508-684 in src/model-auto.ts) implements a pure-function pipeline with zero side effects:
- Rule resolution – Determine base candidates from built-in or user-defined rules.
- CLI injection – Prepend CLI-only providers (Claude, Gemini, Codex, Agent) based on fallback configuration.
- OpenRouter expansion – Add OpenRouter transport alternatives for native providers when API keys are present.
- Filtering and validation – Remove candidates exceeding token limits, lacking credentials, or unsuitable for video understanding.
- Deduplication and ordering – Ensure stable, deterministic output ordered by transport precedence.
Step-by-Step Candidate Resolution Process
Stage 1: Rule-Based Candidate Selection
The resolveRuleCandidates function (lines 373-418) evaluates the model.rules array from user configuration or falls back to DEFAULT_RULES (lines 47-98).
For each rule, the system checks the when property against the current input kind. If the rule specifies flat candidates, those strings are returned immediately. Otherwise, the function iterates through token-based bands (arrays with token.min and token.max properties), returning the first band where promptTokens falls within range. If no bands match, the last rule's candidates serve as the ultimate fallback.
Stage 2: CLI Fallback Candidate Injection
The prependCliCandidates function (lines 20-73) handles the optional insertion of CLI-only providers before native API candidates.
First, resolveCliAutoFallbackConfig (lines 300-335) determines activation status by checking:
autoFallback.enabledis true- Either implicit auto selection is active or
allowAutoCliFallbackis true - Either
onlyWhenNoApiKeysis false orhasAnyApiKeysConfigured(lines 45-58) returns false
When enabled, the system builds an ordered provider list from autoFallback.order or the DEFAULT_AUTO_CLI_ORDER constant (["claude","gemini","codex","agent"]). If prioritizeCliProvider references a previous successful CLI execution (stored in src/run/cli-fallback-state.ts), that provider moves to the front.
Each CLI candidate takes the form cli/<provider>/<model>, with default models defined in DEFAULT_CLI_MODELS (e.g., claude: "sonnet", codex: "gpt-5.2").
Stage 3: OpenRouter Fallback Expansion
When native providers are selected and OPENROUTER_API_KEY is present, the system attempts to add OpenRouter transport alternatives via resolveOpenRouterModelIdForNative (lines 111-140).
The function maintains a process-wide cache (cachedOpenRouterIndex) built from piAi.getModels("openrouter") on first invocation. This index maps three lookup keys for each OpenRouter model: the canonical provider-model ID, the slug, and a punctuation-insensitive normalized slug.
Resolution follows a strict precedence: exact canonical match, then unique slug match, then normalized slug match. When a unique OpenRouter equivalent is found, the system creates a second AutoModelAttempt with transport: "openrouter" and forceOpenRouter: true, preserving the original native attempt for primary execution.
Stage 4: Filtering and Validation
During the assembly loop in buildAutoModelAttempts (lines 558-684), each candidate undergoes rigorous validation:
- Video capability check: If
requiresVideoUnderstandingis true, native candidates without video support are skipped (lines 587-594). - Token limit enforcement: Native attempts are rejected when
promptTokensexceed the provider's maximum input tokens as defined in the LiteLLM catalog. - Credential verification: The
requiredEnvForCandidatefunction determines the required environment variable name (e.g.,OPENAI_API_KEYfor native OpenAI,CLI_CLAUDEfor CLI Claude,OPENROUTER_API_KEYfor OpenRouter). TheenvHasKeycheck ensures the variable exists and is non-empty.
Stage 5: Deduplication and Ordering
The final stage creates a composite deduplication key combining transport, forceOpenRouter, userModelId, and providers array. This ensures that identical candidates from different resolution paths (e.g., a native provider and its OpenRouter equivalent) remain distinct when they offer different transport mechanisms, but prevents true duplicates.
The output array maintains strict precedence: CLI candidates first (fastest to fail, no API costs), then native providers (direct API access), then OpenRouter fallbacks (broader compatibility). This ordering minimizes latency and cost while maximizing success probability.
Configuration and Environment Integration
The auto model selection system exposes several configuration touch-points in src/config.ts:
model.rules[]: User-defined array ofAutoRuleobjects that override the built-inDEFAULT_RULES(lines 47-98 insrc/model-auto.ts).cli.autoFallback: Structured configuration object withenabled,onlyWhenNoApiKeys,order, andprioritizeCliProviderproperties.- Environment variables: The system checks for
OPENAI_API_KEY,GEMINI_API_KEY,ANTHROPIC_API_KEY,OPENROUTER_API_KEY, and CLI-specific variables likeCLI_CLAUDEorCLI_GEMINI.
The hasAnyApiKeysConfigured function (lines 45-58) performs a comprehensive scan of the environment to determine whether any native API keys are present, which influences the CLI fallback logic when onlyWhenNoApiKeys is enabled.
Code Example: Building Auto Model Attempts
The following TypeScript example demonstrates how to invoke the selector programmatically:
import { buildAutoModelAttempts } from "./model-auto.js";
const input = {
kind: "website",
promptTokens: 12_000,
desiredOutputTokens: 2_000,
requiresVideoUnderstanding: false,
env: {
OPENAI_API_KEY: "sk-...",
GEMINI_API_KEY: "..."
},
config: {
model: { mode: "auto" },
cli: {
autoFallback: {
enabled: true,
onlyWhenNoApiKeys: false
}
},
},
catalog: null,
openrouterProvidersFromEnv: ["openai"],
cliAvailability: {
claude: true,
codex: true,
gemini: true,
agent: false
},
isImplicitAutoSelection: true,
allowAutoCliFallback: true,
lastSuccessfulCliProvider: "gemini",
};
const attempts = buildAutoModelAttempts(input);
console.log(attempts);
The resulting array contains AutoModelAttempt objects ordered by transport priority. Each attempt includes the resolved model ID, required environment variable, cost estimate, and debug metadata for observability.
Summary
The auto model selection and fallback system in steipete/summarize implements a deterministic, configuration-driven pipeline for intelligent LLM routing:
- Rule-based selection evaluates input kind and token counts against user-defined or built-in
AutoRuleconfigurations to generate initial candidates. - CLI augmentation optionally prepends command-line providers (Claude, Gemini, Codex, Agent) when auto-fallback is enabled and API availability conditions are met.
- OpenRouter expansion creates transport-layer fallbacks for native providers by resolving canonical model IDs against the OpenRouter catalog.
- Validation filtering removes candidates exceeding token limits, lacking required credentials, or incompatible with video-understanding requirements.
- Deduplication and ordering produces a stable array prioritizing CLI attempts first, native APIs second, and OpenRouter fallbacks last.
All logic resides in src/model-auto.ts and integrates with configuration definitions in src/config.ts, providing a transparent, testable mechanism for graceful degradation across multiple LLM providers.
Frequently Asked Questions
How does the system decide when to use CLI fallback providers?
The resolveCliAutoFallbackConfig function in src/model-auto.ts (lines 300-335) activates CLI fallback only when three conditions align: autoFallback.enabled is true in the configuration, the current execution context permits auto-selection or explicit CLI fallback allowance, and either onlyWhenNoApiKeys is disabled or no native API keys are detected in the environment via hasAnyApiKeysConfigured (lines 45-58). This ensures CLI providers serve as cost-effective alternatives only when appropriate.
What determines the order of providers in the fallback chain?
The system establishes provider precedence through multiple mechanisms. First, resolveRuleCandidates applies user-defined or built-in AutoRule configurations that specify explicit candidate arrays or token-based bands. Next, prependCliCandidates injects CLI providers according to the autoFallback.order array (defaulting to ["claude","gemini","codex","agent"]), optionally promoting the lastSuccessfulCliProvider to the front based on state from src/run/cli-fallback-state.ts. Finally, buildAutoModelAttempts assembles the definitive sequence: CLI attempts first, native API attempts second, and OpenRouter fallbacks last.
How does the system handle video understanding requirements?
When requiresVideoUnderstanding is set to true (typically triggered by media.videoMode: "understand" in the configuration), the buildAutoModelAttempts function (lines 587-594 in src/model-auto.ts) filters the candidate list to exclude native providers that lack video comprehension capabilities. This validation occurs before transport-specific attempts are created, ensuring that only multimodal-capable models remain in the fallback chain for video processing tasks.
Can I customize which models are selected for specific token ranges?
Yes, through the model.rules configuration array in src/config.ts (lines 23-48). Each AutoRule object can specify when conditions to match specific input kinds (text, website, video), and either a flat candidates array or a bands array containing token ranges with min and max properties. The resolveRuleCandidates function (lines 373-418 in src/model-auto.ts) evaluates these rules sequentially, returning the first matching candidate set or falling back to the built-in DEFAULT_RULES (lines 47-98) if no user rules match.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →