# How Exponential Backoff Works in Symphony Retries: Orchestrator Deep Dive

> Learn how Symphony retries agent tasks with exponential backoff. Discover the 1-second initial delay, doubling wait times up to 5 minutes, and gain insights into orchestrator logic.

- Repository: [OpenAI/symphony](https://github.com/openai/symphony)
- Tags: deep-dive
- Published: 2026-05-08

---

**Symphony retries failed agent tasks using a two-phase exponential backoff system that applies a 1-second fixed delay for initial continuation failures, then doubles the wait time from a 10-second base up to a configurable 5-minute ceiling.**

The `openai/symphony` orchestration layer handles transient agent failures through a deterministic retry mechanism implemented in Elixir. Understanding how the orchestrator calculates retry delays enables you to tune agent resilience and prevent cascading system overloads during recovery storms.

## The Two-Phase Retry Strategy

In [`elixir/lib/symphony_elixir/orchestrator.ex`](https://github.com/openai/symphony/blob/main/elixir/lib/symphony_elixir/orchestrator.ex), the `retry_delay/2` function distinguishes between **continuation retries** and **failure backoffs** using task metadata. This bifurcation ensures rapid recovery from transient blips while protecting against persistent failure loops.

### Immediate Continuation Recovery (1-Second Fixed)

When a running task requires its first retry (attempt number 1) and the failure is classified as a continuation, the system applies a short fixed delay. The module attribute `@continuation_retry_delay_ms = 1_000` (1,000 milliseconds) schedules these immediate recoveries without exponential calculation.

This rapid 1-second retry assumes the agent encountered a momentary interruption rather than a hard fault, allowing workflows to resume almost instantly.

### Exponential Failure Backoff (10-Second Base)

All subsequent retries trigger exponential backoff starting from `@failure_retry_base_ms = 10_000` milliseconds (10 seconds). The orchestrator calculates the delay using bit-shifting for computational efficiency:

```elixir
delay_ms = min(
  @failure_retry_base_ms * (1 <<< max_delay_power),
  Config.settings!().agent.max_retry_backoff_ms
)

```

The expression `1 <<< max_delay_power` computes `2^n` (two raised to the power of the retry attempt index), generating the sequence **10 seconds, 20 seconds, 40 seconds, 80 seconds**, and so on. This spreads retry traffic across increasingly wider windows to prevent thundering herds.

## Configuration and Hard Limits

Exponential growth is bounded by the `max_retry_backoff_ms` parameter defined in [`elixir/lib/symphony_elixir/config/schema.ex`](https://github.com/openai/symphony/blob/main/elixir/lib/symphony_elixir/config/schema.ex) at line 133. The default value is **300,000 milliseconds** (5 minutes), which serves as a ceiling to prevent unbounded wait times.

When the calculated delay exceeds this cap, Symphony clamps the value to the maximum:

```elixir

# Attempt 8 hitting the ceiling

min(10_000 * (1 <<< 7), 300_000)

# 10,000 * 128 = 1,280,000 ms → capped to 300,000 ms (5 minutes)

```

You can adjust this ceiling via your application configuration to align with specific SLA requirements or infrastructure constraints.

## Scheduling Implementation Details

Once calculated, delays are scheduled asynchronously using `Process.send_after/3`. The orchestrator workflow follows three discrete steps:

1. Records the next attempt number as `next_attempt`
2. Invokes `retry_delay/2` with the attempt count and metadata to select either the 1-second continuation delay or the exponential backoff value
3. Sets a timer that triggers the retry after the computed `delay_ms`

This non-blocking approach ensures the orchestrator process remains responsive to other agent tasks while waiting for the backoff interval to expire.

## Practical Delay Calculation Examples

Here is how specific retry scenarios map to actual delays according to the source implementation:

```elixir

# First retry (continuation) - immediate recovery

attempt = 1
metadata = %{delay_type: :continuation}

# Returns: 1,000 ms (1 second)

# Second retry (first failure backoff)

attempt = 2
metadata = %{delay_type: :failure}

# Calculates: min(10_000 * (1 <<< 0), 300_000) = 10,000 ms (10 seconds)

# Fifth retry (exponential doubling)

attempt = 5
metadata = %{delay_type: :failure}

# Calculates: min(10_000 * (1 <<< 3), 300_000) = 80,000 ms (80 seconds)

# Eighth retry (cap enforced)

attempt = 8
metadata = %{delay_type: :failure}

# Calculates: min(10,000 * 128, 300,000) → 300,000 ms (5 minutes)

```

## Summary

- **Continuation retries** use a fixed **1-second** delay defined by `@continuation_retry_delay_ms` for the first attempt on running tasks
- **Failure retries** apply exponential backoff starting at **10 seconds** and doubling via the `retry_delay/2` function in [`orchestrator.ex`](https://github.com/openai/symphony/blob/main/orchestrator.ex)
- The calculation `10s × 2^n` is capped at **300,000 ms** (5 minutes) via the `max_retry_backoff_ms` configuration setting
- All delays are scheduled asynchronously using `Process.send_after/3` to maintain system responsiveness

## Frequently Asked Questions

### What is the shortest possible delay between Symphony retries?

The minimum delay is **1,000 milliseconds** (1 second), applied exclusively to continuation retries when `delay_type` is set to `:continuation` and the attempt number is 1. All subsequent failures wait at least 10 seconds.

### How does Symphony prevent infinite exponential growth in retry intervals?

The system enforces a hard ceiling via the `max_retry_backoff_ms` configuration parameter, which defaults to **300,000 milliseconds** (5 minutes) as defined in the config schema. Once the calculated `10s × 2^n` value exceeds this limit, Symphony clamps the delay to the maximum configured value.

### Why does the orchestrator use bit-shifting (`1 <<< n`) instead of standard multiplication?

Elixir's left bit-shift operator provides an efficient, low-overhead method to calculate powers of two (2^n). This micro-optimization in the `retry_delay/2` function reduces CPU cycles when computing delays during high-frequency retry storms involving hundreds of concurrent agent tasks.

### Can I customize the base retry delay for Symphony agent failures?

The base delay of 10 seconds is hardcoded as `@failure_retry_base_ms` in the orchestrator module. While you cannot override this constant without modifying the source code, you can adjust the `max_retry_backoff_ms` configuration to control the upper bound of the exponential curve, effectively limiting the maximum wait time between recovery attempts.