How Exponential Backoff Works in Symphony Retries: Orchestrator Deep Dive
Symphony retries failed agent tasks using a two-phase exponential backoff system that applies a 1-second fixed delay for initial continuation failures, then doubles the wait time from a 10-second base up to a configurable 5-minute ceiling.
The openai/symphony orchestration layer handles transient agent failures through a deterministic retry mechanism implemented in Elixir. Understanding how the orchestrator calculates retry delays enables you to tune agent resilience and prevent cascading system overloads during recovery storms.
The Two-Phase Retry Strategy
In elixir/lib/symphony_elixir/orchestrator.ex, the retry_delay/2 function distinguishes between continuation retries and failure backoffs using task metadata. This bifurcation ensures rapid recovery from transient blips while protecting against persistent failure loops.
Immediate Continuation Recovery (1-Second Fixed)
When a running task requires its first retry (attempt number 1) and the failure is classified as a continuation, the system applies a short fixed delay. The module attribute @continuation_retry_delay_ms = 1_000 (1,000 milliseconds) schedules these immediate recoveries without exponential calculation.
This rapid 1-second retry assumes the agent encountered a momentary interruption rather than a hard fault, allowing workflows to resume almost instantly.
Exponential Failure Backoff (10-Second Base)
All subsequent retries trigger exponential backoff starting from @failure_retry_base_ms = 10_000 milliseconds (10 seconds). The orchestrator calculates the delay using bit-shifting for computational efficiency:
delay_ms = min(
@failure_retry_base_ms * (1 <<< max_delay_power),
Config.settings!().agent.max_retry_backoff_ms
)
The expression 1 <<< max_delay_power computes 2^n (two raised to the power of the retry attempt index), generating the sequence 10 seconds, 20 seconds, 40 seconds, 80 seconds, and so on. This spreads retry traffic across increasingly wider windows to prevent thundering herds.
Configuration and Hard Limits
Exponential growth is bounded by the max_retry_backoff_ms parameter defined in elixir/lib/symphony_elixir/config/schema.ex at line 133. The default value is 300,000 milliseconds (5 minutes), which serves as a ceiling to prevent unbounded wait times.
When the calculated delay exceeds this cap, Symphony clamps the value to the maximum:
# Attempt 8 hitting the ceiling
min(10_000 * (1 <<< 7), 300_000)
# 10,000 * 128 = 1,280,000 ms → capped to 300,000 ms (5 minutes)
You can adjust this ceiling via your application configuration to align with specific SLA requirements or infrastructure constraints.
Scheduling Implementation Details
Once calculated, delays are scheduled asynchronously using Process.send_after/3. The orchestrator workflow follows three discrete steps:
- Records the next attempt number as
next_attempt - Invokes
retry_delay/2with the attempt count and metadata to select either the 1-second continuation delay or the exponential backoff value - Sets a timer that triggers the retry after the computed
delay_ms
This non-blocking approach ensures the orchestrator process remains responsive to other agent tasks while waiting for the backoff interval to expire.
Practical Delay Calculation Examples
Here is how specific retry scenarios map to actual delays according to the source implementation:
# First retry (continuation) - immediate recovery
attempt = 1
metadata = %{delay_type: :continuation}
# Returns: 1,000 ms (1 second)
# Second retry (first failure backoff)
attempt = 2
metadata = %{delay_type: :failure}
# Calculates: min(10_000 * (1 <<< 0), 300_000) = 10,000 ms (10 seconds)
# Fifth retry (exponential doubling)
attempt = 5
metadata = %{delay_type: :failure}
# Calculates: min(10_000 * (1 <<< 3), 300_000) = 80,000 ms (80 seconds)
# Eighth retry (cap enforced)
attempt = 8
metadata = %{delay_type: :failure}
# Calculates: min(10,000 * 128, 300,000) → 300,000 ms (5 minutes)
Summary
- Continuation retries use a fixed 1-second delay defined by
@continuation_retry_delay_msfor the first attempt on running tasks - Failure retries apply exponential backoff starting at 10 seconds and doubling via the
retry_delay/2function inorchestrator.ex - The calculation
10s × 2^nis capped at 300,000 ms (5 minutes) via themax_retry_backoff_msconfiguration setting - All delays are scheduled asynchronously using
Process.send_after/3to maintain system responsiveness
Frequently Asked Questions
What is the shortest possible delay between Symphony retries?
The minimum delay is 1,000 milliseconds (1 second), applied exclusively to continuation retries when delay_type is set to :continuation and the attempt number is 1. All subsequent failures wait at least 10 seconds.
How does Symphony prevent infinite exponential growth in retry intervals?
The system enforces a hard ceiling via the max_retry_backoff_ms configuration parameter, which defaults to 300,000 milliseconds (5 minutes) as defined in the config schema. Once the calculated 10s × 2^n value exceeds this limit, Symphony clamps the delay to the maximum configured value.
Why does the orchestrator use bit-shifting (1 <<< n) instead of standard multiplication?
Elixir's left bit-shift operator provides an efficient, low-overhead method to calculate powers of two (2^n). This micro-optimization in the retry_delay/2 function reduces CPU cycles when computing delays during high-frequency retry storms involving hundreds of concurrent agent tasks.
Can I customize the base retry delay for Symphony agent failures?
The base delay of 10 seconds is hardcoded as @failure_retry_base_ms in the orchestrator module. While you cannot override this constant without modifying the source code, you can adjust the max_retry_backoff_ms configuration to control the upper bound of the exponential curve, effectively limiting the maximum wait time between recovery attempts.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →