How Model Limits Are Determined in Summarize Using the LiteLLM Catalog
Summarize determines per-model token limits by loading a LiteLLM catalog JSON file and resolving the requested model ID against its known context windows and pricing data.
The steipete/summarize repository implements a robust token limit resolution system that queries the community-maintained LiteLLM model catalog. This approach ensures the CLI always respects current provider constraints without hardcoding values that change frequently.
Loading the LiteLLM Catalog
The entry point for all limit lookups is the loadLiteLlmCatalog function defined in src/pricing/litellm.ts (lines 24-38). This async loader manages both network fetching and local caching to minimize external dependencies.
Cache resolution follows a strict priority:
- First, it checks
$HOME/.summarize/cache(or the directory specified byTOKENTALLY_CACHE_DIR) - If a fresh cache file exists, it returns immediately with
source: "cache" - Otherwise, it performs a conditional HTTP request using
If-None-Matchheaders to fetchlitellm-model_prices_and_context_window.jsonfrom the LiteLLM GitHub repository - Fresh network responses are written to disk before returning with
source: "network"
If both cache and network fail, the function returns source: "none" and a null catalog, allowing the application to proceed with safe defaults.
Resolving Token Limits
Once loaded, the catalog object feeds three thin wrapper functions in src/pricing/litellm.ts (lines 42-61):
resolveLiteLlmMaxInputTokensForModelId→ Returns themax_input_tokensvalue (falls back tomax_tokensfor older entries)resolveLiteLlmMaxOutputTokensForModelId→ Returns themax_output_tokensvalueresolveLiteLlmPricingForModelId→ Returns per-token pricing data used for cost estimation elsewhere in the codebase
Each resolver forwards to corresponding tokentally helpers that perform the actual dictionary lookup. If a model entry lacks the required fields, the functions return null, signaling that no hard limit is enforced by the catalog.
Model ID Format and Prefix Stripping
The LiteLLM catalog stores gateway-style keys such as "gpt-5.2" or "claude-opus-4-5". However, Summarize accepts model IDs with provider prefixes like "openai/gpt-5.2" or "anthropic/claude-opus-4-5".
The resolver functions automatically strip the provider prefix before lookup, allowing seamless use of both formats. This behavior is extensively tested in tests/pricing.litellm.test.ts, which verifies that prefixed and non-prefixed IDs resolve to identical limit values.
Runtime Enforcement
During execution, the limits retrieved from the catalog enforce hard caps before sending data to LLM providers. In src/run/run-metrics.ts (lines 63-81), the runner calls both resolveLiteLlmMaxInputTokensForModelId and resolveLiteLlmMaxOutputTokensForModelId to:
- Validate that the input content fits within the model's context window
- Set the maximum completion tokens parameter in the API request
- Trigger input truncation or chunking strategies when content exceeds the resolved limits
Practical Implementation Examples
Loading Limits for a Specific Model
import {
loadLiteLlmCatalog,
resolveLiteLlmMaxInputTokensForModelId,
resolveLiteLlmMaxOutputTokensForModelId,
} from '@steipete/summarize-core/src/pricing/litellm.js';
async function getModelLimits(modelId: string) {
const { catalog, source } = await loadLiteLlmCatalog({
env: process.env, // respects HOME/TOKENTALLY_CACHE_DIR
fetchImpl: fetch, // any fetch implementation works
});
if (!catalog) {
throw new Error('Unable to obtain LiteLLM catalog');
}
const maxInput = resolveLiteLlmMaxInputTokensForModelId(catalog, modelId);
const maxOutput = resolveLiteLlmMaxOutputTokensForModelId(catalog, modelId);
console.log(`Model ${modelId} limits (source=${source}):`);
console.log(` Max input tokens : ${maxInput ?? 'unknown'}`);
console.log(` Max output tokens: ${maxOutput ?? 'unknown'}`);
}
// Example usage
getModelLimits('openai/gpt-5.2');
Enforcing Limits During Summarization
import { runSummary } from '@steipete/summarize-core';
import { loadLiteLlmCatalog, resolveLiteLlmMaxInputTokensForModelId } from '@steipete/summarize-core/src/pricing/litellm.js';
async function summarizeWithSafety(url: string, modelId: string) {
const { catalog } = await loadLiteLlmCatalog({
env: process.env,
fetchImpl: fetch
});
const maxInput = resolveLiteLlmMaxInputTokensForModelId(
catalog ?? {},
modelId
) ?? 8192;
// The runner automatically respects maxInputTokens when preparing the request
await runSummary({ url, modelId, maxInputTokens: maxInput });
}
Summary
- LiteLLM Catalog: Summarize consumes the community-maintained
litellm-model_prices_and_context_window.jsonfile to determine current model constraints - Smart Caching: The system prioritizes
$HOME/.summarize/cacheand uses conditional HTTP requests to minimize bandwidth - Flexible IDs: Provider prefixes (e.g.,
"openai/") are automatically stripped before catalog lookup - Three Resolvers: Dedicated functions handle max input tokens, max output tokens, and pricing lookups
- Null Safety: When models are missing from the catalog, resolvers return
nullrather than throwing, allowing fallback defaults
Frequently Asked Questions
What happens if a model is not present in the LiteLLM catalog?
If the requested model ID cannot be found in the catalog, the resolver functions return null for both input and output limits. According to the implementation in src/pricing/litellm.ts, this signals to the caller that no hard limit is known, prompting the CLI to fall back to safe defaults (typically 8192 tokens) or defer to the provider's native error handling.
Where does Summarize store the LiteLLM catalog locally?
By default, the catalog is cached in $HOME/.summarize/cache/litellm-model_prices_and_context_window.json. You can override this location by setting the TOKENTALLY_CACHE_DIR environment variable before running the CLI, which redirects all cache operations to your specified directory.
How does the system handle provider-specific model ID prefixes?
The resolver functions in src/pricing/litellm.ts automatically strip provider prefixes such as "openai/" or "anthropic/" before performing the catalog lookup. This allows users to specify fully-qualified model IDs like "openai/gpt-5.2" while the system searches for the gateway-style key "gpt-5.2" in the LiteLLM catalog.
Can I use a custom fetch implementation when loading the catalog?
Yes. The loadLiteLlmCatalog function accepts a fetchImpl parameter in its options object, allowing you to inject any Fetch API-compatible implementation. This is essential for testing environments or scenarios requiring custom proxies, authentication headers, or offline mock data.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →