How Model Limits Are Determined in Summarize Using the LiteLLM Catalog

Summarize determines per-model token limits by loading a LiteLLM catalog JSON file and resolving the requested model ID against its known context windows and pricing data.

The steipete/summarize repository implements a robust token limit resolution system that queries the community-maintained LiteLLM model catalog. This approach ensures the CLI always respects current provider constraints without hardcoding values that change frequently.

Loading the LiteLLM Catalog

The entry point for all limit lookups is the loadLiteLlmCatalog function defined in src/pricing/litellm.ts (lines 24-38). This async loader manages both network fetching and local caching to minimize external dependencies.

Cache resolution follows a strict priority:

  • First, it checks $HOME/.summarize/cache (or the directory specified by TOKENTALLY_CACHE_DIR)
  • If a fresh cache file exists, it returns immediately with source: "cache"
  • Otherwise, it performs a conditional HTTP request using If-None-Match headers to fetch litellm-model_prices_and_context_window.json from the LiteLLM GitHub repository
  • Fresh network responses are written to disk before returning with source: "network"

If both cache and network fail, the function returns source: "none" and a null catalog, allowing the application to proceed with safe defaults.

Resolving Token Limits

Once loaded, the catalog object feeds three thin wrapper functions in src/pricing/litellm.ts (lines 42-61):

  • resolveLiteLlmMaxInputTokensForModelId → Returns the max_input_tokens value (falls back to max_tokens for older entries)
  • resolveLiteLlmMaxOutputTokensForModelId → Returns the max_output_tokens value
  • resolveLiteLlmPricingForModelId → Returns per-token pricing data used for cost estimation elsewhere in the codebase

Each resolver forwards to corresponding tokentally helpers that perform the actual dictionary lookup. If a model entry lacks the required fields, the functions return null, signaling that no hard limit is enforced by the catalog.

Model ID Format and Prefix Stripping

The LiteLLM catalog stores gateway-style keys such as "gpt-5.2" or "claude-opus-4-5". However, Summarize accepts model IDs with provider prefixes like "openai/gpt-5.2" or "anthropic/claude-opus-4-5".

The resolver functions automatically strip the provider prefix before lookup, allowing seamless use of both formats. This behavior is extensively tested in tests/pricing.litellm.test.ts, which verifies that prefixed and non-prefixed IDs resolve to identical limit values.

Runtime Enforcement

During execution, the limits retrieved from the catalog enforce hard caps before sending data to LLM providers. In src/run/run-metrics.ts (lines 63-81), the runner calls both resolveLiteLlmMaxInputTokensForModelId and resolveLiteLlmMaxOutputTokensForModelId to:

  1. Validate that the input content fits within the model's context window
  2. Set the maximum completion tokens parameter in the API request
  3. Trigger input truncation or chunking strategies when content exceeds the resolved limits

Practical Implementation Examples

Loading Limits for a Specific Model

import {
  loadLiteLlmCatalog,
  resolveLiteLlmMaxInputTokensForModelId,
  resolveLiteLlmMaxOutputTokensForModelId,
} from '@steipete/summarize-core/src/pricing/litellm.js';

async function getModelLimits(modelId: string) {
  const { catalog, source } = await loadLiteLlmCatalog({
    env: process.env,           // respects HOME/TOKENTALLY_CACHE_DIR
    fetchImpl: fetch,           // any fetch implementation works
  });

  if (!catalog) {
    throw new Error('Unable to obtain LiteLLM catalog');
  }

  const maxInput = resolveLiteLlmMaxInputTokensForModelId(catalog, modelId);
  const maxOutput = resolveLiteLlmMaxOutputTokensForModelId(catalog, modelId);

  console.log(`Model ${modelId} limits (source=${source}):`);
  console.log(`  Max input tokens : ${maxInput ?? 'unknown'}`);
  console.log(`  Max output tokens: ${maxOutput ?? 'unknown'}`);
}

// Example usage
getModelLimits('openai/gpt-5.2');

Enforcing Limits During Summarization

import { runSummary } from '@steipete/summarize-core';
import { loadLiteLlmCatalog, resolveLiteLlmMaxInputTokensForModelId } from '@steipete/summarize-core/src/pricing/litellm.js';

async function summarizeWithSafety(url: string, modelId: string) {
  const { catalog } = await loadLiteLlmCatalog({ 
    env: process.env, 
    fetchImpl: fetch 
  });
  
  const maxInput = resolveLiteLlmMaxInputTokensForModelId(
    catalog ?? {}, 
    modelId
  ) ?? 8192;

  // The runner automatically respects maxInputTokens when preparing the request
  await runSummary({ url, modelId, maxInputTokens: maxInput });
}

Summary

  • LiteLLM Catalog: Summarize consumes the community-maintained litellm-model_prices_and_context_window.json file to determine current model constraints
  • Smart Caching: The system prioritizes $HOME/.summarize/cache and uses conditional HTTP requests to minimize bandwidth
  • Flexible IDs: Provider prefixes (e.g., "openai/") are automatically stripped before catalog lookup
  • Three Resolvers: Dedicated functions handle max input tokens, max output tokens, and pricing lookups
  • Null Safety: When models are missing from the catalog, resolvers return null rather than throwing, allowing fallback defaults

Frequently Asked Questions

What happens if a model is not present in the LiteLLM catalog?

If the requested model ID cannot be found in the catalog, the resolver functions return null for both input and output limits. According to the implementation in src/pricing/litellm.ts, this signals to the caller that no hard limit is known, prompting the CLI to fall back to safe defaults (typically 8192 tokens) or defer to the provider's native error handling.

Where does Summarize store the LiteLLM catalog locally?

By default, the catalog is cached in $HOME/.summarize/cache/litellm-model_prices_and_context_window.json. You can override this location by setting the TOKENTALLY_CACHE_DIR environment variable before running the CLI, which redirects all cache operations to your specified directory.

How does the system handle provider-specific model ID prefixes?

The resolver functions in src/pricing/litellm.ts automatically strip provider prefixes such as "openai/" or "anthropic/" before performing the catalog lookup. This allows users to specify fully-qualified model IDs like "openai/gpt-5.2" while the system searches for the gateway-style key "gpt-5.2" in the LiteLLM catalog.

Can I use a custom fetch implementation when loading the catalog?

Yes. The loadLiteLlmCatalog function accepts a fetchImpl parameter in its options object, allowing you to inject any Fetch API-compatible implementation. This is essential for testing environments or scenarios requiring custom proxies, authentication headers, or offline mock data.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →