How Error Handling and Retry Logic Work for LLM Calls in the Summarize CLI

The Summarize CLI implements a robust retry mechanism with exponential backoff that automatically retries LLM calls on timeout or empty responses, using isRetryableTimeoutError detection and computeRetryDelayMs for jittered delays up to 2 seconds.

The steipete/summarize repository provides a TypeScript-based CLI and library for generating text summaries using Large Language Models (LLM). Understanding how error handling and retry logic work for LLM calls is crucial for building resilient AI applications that gracefully handle transient failures like network timeouts or temporary service unavailability.

Core Retry Architecture in generate-text.ts

The Main Retry Loop

The heart of the implementation lives in src/llm/generate-text.ts. The generateTextWithModelId function orchestrates retries through a while (attempt <= maxRetries) loop that wraps each LLM request in an AbortController to enforce per-attempt timeouts.

Detecting Retryable Errors

Not all errors warrant a retry. The helper function isRetryableTimeoutError (lines 76-87) specifically checks for transient conditions by scanning error messages for "timed out" or "empty summary" strings. This selective approach prevents wasting resources on permanent failures like authentication errors.

Exponential Backoff with Jitter

When a retryable error occurs, computeRetryDelayMs (lines 89-93) calculates the wait time using the formula 500ms * (attempt + 1) + jitter, capped at 2 seconds. This backoff strategy prevents thundering herd problems while ensuring quick recovery from temporary blips.

Error Normalization and Provider-Specific Handling

Standardizing Error Messages

Before retry logic evaluates an error, generateTextWithModelId normalizes raw provider errors into consistent formats. AbortController timeouts transform into clear "LLM request timed out" messages, while provider-specific quirks get unified for reliable isRetryableTimeoutError detection.

Anthropic-Specific Error Handling

The codebase includes specialized normalization in src/llm/providers/anthropic.ts via normalizeAnthropicModelAccessError. This function intercepts Anthropic-specific rate limits and authentication failures, ensuring they surface correctly before the generic retry logic processes them.

CLI Integration and Observability

Configuring Retries via Command Line

Users control retry behavior through the CLI interface defined in src/run/help.ts. The --retries <count> flag (defaulting to 1) maps directly to the maxRetries parameter in generateTextWithModelId, while --timeout sets the per-attempt AbortController deadline (default 30s).

Verbose Retry Logging

When running with --verbose, the createRetryLogger function in src/run/logging.ts (lines 31-65) emits detailed diagnostics. Users see formatted messages like LLM timeout for xai/...; retry 2/3 in 720ms., providing transparency into the backoff calculations and attempt counts.

Streaming Implementation

The streamTextWithModelId function shares the same retry foundation as its non-streaming counterpart. It applies isRetryableTimeoutError and computeRetryDelayMs to the initial connection establishment. Once streaming begins, individual chunk errors surface through the lastError() callback rather than triggering retries, ensuring uninterrupted text generation once the connection stabilizes.

Summary

  • The retry logic centers on generateTextWithModelId in src/llm/generate-text.ts, using a while loop with AbortController timeouts.
  • isRetryableTimeoutError filters for transient "timed out" or "empty summary" conditions, preventing retries on permanent failures.
  • computeRetryDelayMs implements capped exponential backoff (max 2s) with jitter to avoid thundering herd scenarios.
  • Provider-specific errors get normalized through normalizeAnthropicModelAccessError and similar functions before retry evaluation.
  • CLI users configure behavior via --retries and --timeout, with --verbose enabling detailed logging through createRetryLogger.

Frequently Asked Questions

What types of errors trigger a retry in the Summarize CLI?

Only transient errors containing "timed out" or "empty summary" in their messages trigger retries, as determined by isRetryableTimeoutError. Permanent failures like authentication errors or invalid model names bypass the retry logic and fail immediately.

How does the retry backoff calculation work?

The computeRetryDelayMs function calculates delay using 500ms * (attempt + 1) plus random jitter, capped at 2 seconds. This provides exponential backoff that prevents overwhelming the LLM provider while ensuring quick recovery from temporary outages.

Can I disable retries or adjust the timeout per request?

Yes. When using the CLI, pass --retries 0 to disable retries or --retries N to set the maximum attempts. Use --timeout 60s to change the per-attempt timeout from the default 30 seconds. Programmatically, pass the retries and timeoutMs parameters to generateTextWithModelId.

Does the streaming API use the same retry logic?

The streamTextWithModelId function applies the same retry and backoff logic to the initial connection request. Once the stream starts, individual chunk errors are surfaced through lastError() without triggering additional retries, ensuring continuous output once the connection is established.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →