How Traefik's Retry Mechanism Works: Complete Configuration Guide
Traefik's retry middleware automatically re-executes failed HTTP requests up to a configurable number of attempts when specific status codes or network errors occur, using exponential backoff and request body mirroring to ensure safe, idempotent retries.
The retry mechanism in the traefik/traefik repository provides resilience against transient backend failures by intercepting requests and conditionally repeating them. This article explores the internal architecture of the middleware and documents every configuration parameter available in the static and dynamic configuration.
Architecture of the Traefik Retry Middleware
The core implementation resides in [pkg/middlewares/retry/retry.go](https://github.com/traefik/traefik/blob/master/pkg/middlewares/retry/retry.go). The middleware operates as a chainable HTTP handler that wraps your backend service and implements a sophisticated retry loop with back-off logic.
Middleware Construction and Initialization
The New function in [pkg/middlewares/retry/retry.go](https://github.com/traefik/traefik/blob/master/pkg/middlewares/retry/retry.go) validates the dynamic.Retry configuration and constructs a retry struct that holds runtime parameters including attempt limits, status code ranges, and back-off settings【L101-L138】. This initialization phase ensures that parameters like attempts are greater than zero and that duration strings parse correctly before the middleware enters the request chain.
If the configuration specifies only one attempt, the middleware immediately delegates to the next handler without overhead【L41-L45】. For multi-attempt configurations, the middleware prepares a reusable request body buffer.
Request Handling and Body Mirroring
During the ServeHTTP method execution, the middleware creates a reusable copy of the request body using mirror.NewReusableRequest so that the identical payload can be transmitted on each retry attempt【L49-L63】. This prevents body exhaustion issues during retries while respecting the maxRequestBodyBytes limit.
The middleware then builds a retry operation function that:
- Creates OpenTelemetry spans for each attempt when tracing is enabled【L78-L92】
- Validates idempotency by checking for HTTP methods (
GET,HEAD,OPTIONS, etc.) unlessretryNonIdempotentMethodis explicitly enabled【L101-L104】 - Wraps the downstream response with a custom
responseWriterthat tracks status codes, headers, and retry eligibility【L110-L124】 - Injects a
ShouldRetrycallback into the request context whendisableRetryOnNetworkErrorisfalse, allowing network I/O layers to signal retry necessity【L168-L176】
Retry Decision Logic
When the backend writes response headers, the custom responseWriter.WriteHeader method executes the decision logic at【L138-L152】. The middleware checks whether the HTTP status code falls within the configured status ranges and whether additional attempts remain. If both conditions satisfy, it sets shouldRetry to true; otherwise, it flushes the collected headers to the real response and marks the transmission as complete.
Exponential Back-Off Implementation
Traefik utilizes the cenkalti/backoff library for retry timing. If initialInterval is configured and multiple attempts are requested, the middleware creates an exponential back-off instance with a multiplier guaranteeing the interval never exceeds twice the initial value【L54-L66】.
The backoff.RetryNotify function repeatedly invokes the operation until success, maximum attempts exhaustion, or timeout occurrence. Each retry triggers the Listener.Retried callback, enabling metrics collection and structured logging【L36-L44】.
Global Timeout Enforcement
Both the back-off mechanism and the response writer enforce the global timeout parameter. When elapsed time exceeds this threshold, the retry loop terminates immediately and returns the last received response or error to the client【L138-L145】【L27-L33】.
Configurable Retry Parameters
The following parameters control retry behavior in Traefik's dynamic configuration:
| Parameter | Type | Description | Default |
|---|---|---|---|
attempts |
int | Total number of tries (initial + retries). Must be > 0. | Required |
initialInterval |
duration | First wait time for exponential back-off. Zero disables back-off. | 0 |
timeout |
duration | Maximum cumulative time allowed for all retry attempts. | 0 (unlimited) |
maxRequestBodyBytes |
int64 | Maximum request body size that can be buffered for reuse. -1 means unlimited. |
-1 |
status |
[]string | HTTP status codes or ranges triggering retry (e.g., ["500","502-504"]). |
[] (disabled) |
disableRetryOnNetworkError |
bool | When true, network-level errors (connection reset, etc.) skip retry logic. Requires status to be set. |
false |
retryNonIdempotentMethod |
bool | Allow retries for non-idempotent methods (POST, PATCH, etc.). Use with caution. |
false |
These fields are officially documented in [docs/content/reference/routing-configuration/http/middlewares/retry.md](https://github.com/traefik/traefik/blob/master/docs/content/reference/routing-configuration/http/middlewares/retry.md)【L92-L103】.
Configuration Examples
Static File Configuration (YAML)
Define retry behavior in your traefik.yml or dynamic configuration file:
http:
middlewares:
api-retry:
retry:
attempts: 5
initialInterval: 150ms
timeout: 30s
maxRequestBodyBytes: 1048576 # 1 MiB
status:
- "500-599"
disableRetryOnNetworkError: false
retryNonIdempotentMethod: false
Docker Compose Labels
Configure retries via container labels in Docker environments:
labels:
- "traefik.http.middlewares.api-retry.retry.attempts=5"
- "traefik.http.middlewares.api-retry.retry.initialinterval=150ms"
- "traefik.http.middlewares.api-retry.retry.timeout=30s"
- "traefik.http.middlewares.api-retry.retry.maxrequestbodybytes=1048576"
- "traefik.http.middlewares.api-retry.retry.status=500-599"
- "traefik.http.middlewares.api-retry.retry.disableretryonnetworkerror=false"
Programmatic Go Implementation
Integrate the retry middleware directly in Go applications using the Traefik SDK:
import (
"context"
"net/http"
"github.com/traefik/traefik/v3/pkg/config/dynamic"
"github.com/traefik/traefik/v3/pkg/middlewares/retry"
)
func main() {
cfg := dynamic.Retry{
Attempts: 4,
InitialInterval: "200ms",
Timeout: "20s",
MaxRequestBodyBytes: 2 << 20, // 2 MiB
Status: []string{"502", "503-504"},
DisableRetryOnNetworkError: true,
RetryNonIdempotentMethod: false,
}
next := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Backend logic here
})
retryMw, err := retry.New(context.Background(), next, cfg, retry.Listeners{}, "api-retry")
if err != nil {
panic(err)
}
http.ListenAndServe(":8080", retryMw)
}
The retry.New constructor is implemented at【L101-L138】in the source repository.
Summary
- Core Location: The retry middleware implementation lives in
pkg/middlewares/retry/retry.go, utilizing thecenkalti/backofflibrary for timing control. - Safety Mechanisms: By default, only idempotent HTTP methods are retried unless
retryNonIdempotentMethodis enabled, and request bodies are mirrored to prevent stream exhaustion. - Configurable Triggers: Retries activate based on HTTP status codes (
statusranges), network errors (unlessdisableRetryOnNetworkErroris true), or both. - Timing Control: Use
initialIntervalfor exponential back-off andtimeoutto set an upper bound on total retry duration. - Resource Limits: The
maxRequestBodyBytesparameter prevents memory exhaustion when buffering large payloads for reuse.
Frequently Asked Questions
Which HTTP methods does Traefik retry by default?
Traefik only retries idempotent methods (GET, HEAD, OPTIONS, PUT, DELETE) by default to prevent duplicate mutations. You must explicitly set retryNonIdempotentMethod: true to enable retries for POST, PATCH, or other non-idempotent verbs, though this risks duplicate processing on the backend.
How does the exponential back-off calculate intervals?
When initialInterval is set, Traefik creates an exponential back-off where each subsequent wait time increases based on the cenkalti/backoff algorithm, capping at twice the initial interval value【L54-L66】. If initialInterval is zero or unspecified, retries execute immediately without delay between attempts.
Can I retry requests that fail due to network errors?
Yes, unless you set disableRetryOnNetworkError: true. By default, the middleware injects a ShouldRetry callback into the request context that captures network-level failures (connection resets, timeouts) and allows retry logic to handle them【L168-L176】. Note that disabling network error retries requires the status parameter to be configured with specific HTTP codes.
What happens if the request body exceeds maxRequestBodyBytes?
When the request body size exceeds the configured maxRequestBodyBytes limit (or if set to -1 for unlimited), Traefik buffers the body using mirror.NewReusableRequest to allow resending on subsequent attempts【L49-L63】. If the body is too large to buffer (when limits are enforced), the retry mechanism may fail to reuse the payload on subsequent attempts.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →