# How Error Handling and Retry Logic Work for LLM Calls in the Summarize CLI

> Learn how the Summarize CLI handles LLM call errors with automatic retries exponential backoff and jittered delays up to 2 seconds.

- Repository: [Peter Steinberger/summarize](https://github.com/steipete/summarize)
- Tags: internals
- Published: 2026-02-19

---

**The Summarize CLI implements a robust retry mechanism with exponential backoff that automatically retries LLM calls on timeout or empty responses, using `isRetryableTimeoutError` detection and `computeRetryDelayMs` for jittered delays up to 2 seconds.**

The `steipete/summarize` repository provides a TypeScript-based CLI and library for generating text summaries using Large Language Models (LLM). Understanding how error handling and retry logic work for LLM calls is crucial for building resilient AI applications that gracefully handle transient failures like network timeouts or temporary service unavailability.

## Core Retry Architecture in generate-text.ts

### The Main Retry Loop

The heart of the implementation lives in [`src/llm/generate-text.ts`](https://github.com/steipete/summarize/blob/main/src/llm/generate-text.ts). The `generateTextWithModelId` function orchestrates retries through a `while (attempt <= maxRetries)` loop that wraps each LLM request in an `AbortController` to enforce per-attempt timeouts.

### Detecting Retryable Errors

Not all errors warrant a retry. The helper function `isRetryableTimeoutError` (lines 76-87) specifically checks for transient conditions by scanning error messages for "timed out" or "empty summary" strings. This selective approach prevents wasting resources on permanent failures like authentication errors.

### Exponential Backoff with Jitter

When a retryable error occurs, `computeRetryDelayMs` (lines 89-93) calculates the wait time using the formula `500ms * (attempt + 1) + jitter`, capped at 2 seconds. This backoff strategy prevents thundering herd problems while ensuring quick recovery from temporary blips.

## Error Normalization and Provider-Specific Handling

### Standardizing Error Messages

Before retry logic evaluates an error, `generateTextWithModelId` normalizes raw provider errors into consistent formats. AbortController timeouts transform into clear "LLM request timed out" messages, while provider-specific quirks get unified for reliable `isRetryableTimeoutError` detection.

### Anthropic-Specific Error Handling

The codebase includes specialized normalization in [`src/llm/providers/anthropic.ts`](https://github.com/steipete/summarize/blob/main/src/llm/providers/anthropic.ts) via `normalizeAnthropicModelAccessError`. This function intercepts Anthropic-specific rate limits and authentication failures, ensuring they surface correctly before the generic retry logic processes them.

## CLI Integration and Observability

### Configuring Retries via Command Line

Users control retry behavior through the CLI interface defined in [`src/run/help.ts`](https://github.com/steipete/summarize/blob/main/src/run/help.ts). The `--retries <count>` flag (defaulting to 1) maps directly to the `maxRetries` parameter in `generateTextWithModelId`, while `--timeout` sets the per-attempt AbortController deadline (default 30s).

### Verbose Retry Logging

When running with `--verbose`, the `createRetryLogger` function in [`src/run/logging.ts`](https://github.com/steipete/summarize/blob/main/src/run/logging.ts) (lines 31-65) emits detailed diagnostics. Users see formatted messages like `LLM timeout for xai/...; retry 2/3 in 720ms.`, providing transparency into the backoff calculations and attempt counts.

## Streaming Implementation

The `streamTextWithModelId` function shares the same retry foundation as its non-streaming counterpart. It applies `isRetryableTimeoutError` and `computeRetryDelayMs` to the initial connection establishment. Once streaming begins, individual chunk errors surface through the `lastError()` callback rather than triggering retries, ensuring uninterrupted text generation once the connection stabilizes.

## Summary

- The retry logic centers on `generateTextWithModelId` in [`src/llm/generate-text.ts`](https://github.com/steipete/summarize/blob/main/src/llm/generate-text.ts), using a `while` loop with `AbortController` timeouts.
- `isRetryableTimeoutError` filters for transient "timed out" or "empty summary" conditions, preventing retries on permanent failures.
- `computeRetryDelayMs` implements capped exponential backoff (max 2s) with jitter to avoid thundering herd scenarios.
- Provider-specific errors get normalized through `normalizeAnthropicModelAccessError` and similar functions before retry evaluation.
- CLI users configure behavior via `--retries` and `--timeout`, with `--verbose` enabling detailed logging through `createRetryLogger`.

## Frequently Asked Questions

### What types of errors trigger a retry in the Summarize CLI?

Only transient errors containing "timed out" or "empty summary" in their messages trigger retries, as determined by `isRetryableTimeoutError`. Permanent failures like authentication errors or invalid model names bypass the retry logic and fail immediately.

### How does the retry backoff calculation work?

The `computeRetryDelayMs` function calculates delay using `500ms * (attempt + 1)` plus random jitter, capped at 2 seconds. This provides exponential backoff that prevents overwhelming the LLM provider while ensuring quick recovery from temporary outages.

### Can I disable retries or adjust the timeout per request?

Yes. When using the CLI, pass `--retries 0` to disable retries or `--retries N` to set the maximum attempts. Use `--timeout 60s` to change the per-attempt timeout from the default 30 seconds. Programmatically, pass the `retries` and `timeoutMs` parameters to `generateTextWithModelId`.

### Does the streaming API use the same retry logic?

The `streamTextWithModelId` function applies the same retry and backoff logic to the initial connection request. Once the stream starts, individual chunk errors are surfaced through `lastError()` without triggering additional retries, ensuring continuous output once the connection is established.