How AgentMemory Handles LLM Failures with Circuit Breaker and Fallback

AgentMemory isolates external LLM calls behind a ResilientProvider wrapper that implements circuit-breaker logic to prevent cascading failures, while a FallbackChainProvider automatically retries alternative providers when the primary fails.

AgentMemory implements a multi-layered resilience strategy to ensure memory operations remain stable even when external LLM providers experience outages or latency spikes. The architecture wraps every concrete provider with circuit-breaker protection and chains multiple providers for automatic failover. This article examines the exact implementation details in the rohitg00/agentmemory repository, showing how the codebase handles LLM failures with circuit breaker and fallback mechanisms to achieve self-healing behavior.

Circuit Breaker Protection with ResilientProvider

Every LLM provider in AgentMemory is wrapped by a ResilientProvider class defined in src/providers/resilient.ts. This wrapper instantiates a CircuitBreaker object from src/providers/circuit-breaker.ts that monitors failure rates and temporarily blocks requests to failing services.

The circuit breaker implements a standard state machine with three states:

  • Closed: Normal operation where requests pass through to the underlying provider
  • Open: All requests are immediately rejected with the error "circuit_breaker_open"
  • Half-Open: After a recovery timeout, a single trial request is permitted to test if the service has recovered

According to the source code, the breaker starts in the closed state with isAllowed === true. Each failed request increments a failure counter. When failures reach the configurable threshold (default 3) within a sliding failure window (default 60 seconds), the breaker transitions to open state (lines 13-21, 32-44).

After a recovery timeout (default 30 seconds), the breaker enters half-open state. A successful trial request triggers recordSuccess(), which closes the circuit and resets counters. A failed trial immediately re-opens the circuit.

For monitoring and telemetry, the current breaker state—including timestamps and failure counts—is exposed via the circuitState property on the ResilientProvider instance (lines 34-36).

import { ResilientProvider } from "./providers/resilient.js";
import { OpenRouterProvider } from "./providers/openrouter.js";

const resilient = new ResilientProvider(new OpenRouterProvider(apiKey));

// Inspect circuit state for health metrics
console.log(resilient.circuitState); 
// { state: "closed", failures: 0, lastFailure: null, nextAttempt: null }

Fallback Chain for Provider Failover

When a provider fails or its circuit breaker is open, AgentMemory delegates the request to a FallbackChainProvider defined in src/providers/fallback-chain.ts. This chain executes providers sequentially until one succeeds.

The tryAll method iterates through the configured provider list, returning the first successful result. If every provider in the chain fails, it throws the last captured error (lines 18-30).

The chain construction reflects the provider priority order. For example, a chain configured with Anthropic, Gemini, and OpenRouter displays as fallback(anthropic → gemini → openrouter) during execution, as verified in the test expectations in test/fallback-chain.test.ts (lines 73-76).

Graceful Degradation with NoopProvider

If no LLM API keys are configured in the environment, detectProvider() returns a NoopProvider instance from src/providers/noop.ts. This provider implements the same interface as production providers but returns empty strings for all operations.

This ensures the system never throws unhandled exceptions when all external services are unavailable, allowing AgentMemory to degrade gracefully rather than crashing.

Implementation Example

To utilize these resilience features, import the factory functions from src/providers/index.ts:

// Creating a resilient provider with automatic fallback chain
import { createFallbackProvider } from "./providers/index.js";

// Builds chain from env-configured providers (ANTHROPIC, GEMINI, OPENROUTER)
const provider = createFallbackProvider(config);

try {
  const summary = await provider.summarize(systemPrompt, userPrompt);
  console.log("LLM summary:", summary);
} catch (e) {
  if (e.message === "circuit_breaker_open") {
    console.warn("Circuit open - using cached data or degrading gracefully");
  } else {
    console.error("All providers failed:", e);
  }
}

Summary

  • AgentMemory implements a two-layer resilience strategy: circuit breakers prevent hammering flaky services, while fallback chains automatically retry alternative LLM providers.
  • The ResilientProvider wrapper in src/providers/resilient.ts monitors failure rates and opens circuits after 3 failures within 60 seconds, blocking requests for 30 seconds before attempting recovery.
  • The FallbackChainProvider in src/providers/fallback-chain.ts sequentially executes providers via the tryAll method until one succeeds.
  • NoopProvider ensures graceful degradation when no LLM keys are configured, returning empty strings rather than throwing exceptions.
  • Together, these mechanisms provide the self-healing behavior documented in the AgentMemory README, isolating LLM failures from core memory operations.

Frequently Asked Questions

What triggers the circuit breaker to open in AgentMemory?

The circuit breaker opens when a provider records 3 failures within a 60-second sliding window. Once open, all subsequent requests immediately fail with the error "circuit_breaker_open" without hitting the external API, preventing resource exhaustion and cascading failures.

How does the fallback chain decide which provider to use next?

The FallbackChainProvider iterates through providers in the order defined by the fallback configuration (environment variables or defaults). It calls the tryAll method, which attempts each provider sequentially and returns the first successful result. If all fail, it propagates the last error captured.

Can I monitor the circuit breaker state in real-time?

Yes. The ResilientProvider exposes a circuitState property that returns the current state (closed, open, or half-open), failure count, timestamps of last failures, and the next scheduled retry attempt. This enables integration with monitoring systems and health dashboards.

What happens if all LLM providers are unavailable?

If every provider in the fallback chain fails, or if no API keys are configured, AgentMemory uses the NoopProvider, which returns empty strings for all LLM operations. This ensures the memory system remains operational during total LLM outages, though LLM-dependent features will return empty results.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →