# How AgentMemory Handles LLM Failures with Circuit Breaker and Fallback

> Discover how AgentMemory prevents LLM failures using circuit breakers and fallback strategies. Ensure your applications remain resilient by isolating LLM calls and retrying alternatives automatically.

- Repository: [Rohit Ghumare/agentmemory](https://github.com/rohitg00/agentmemory)
- Tags: how-to-guide
- Published: 2026-05-10

---

**AgentMemory isolates external LLM calls behind a `ResilientProvider` wrapper that implements circuit-breaker logic to prevent cascading failures, while a `FallbackChainProvider` automatically retries alternative providers when the primary fails.**

AgentMemory implements a multi-layered resilience strategy to ensure memory operations remain stable even when external LLM providers experience outages or latency spikes. The architecture wraps every concrete provider with circuit-breaker protection and chains multiple providers for automatic failover. This article examines the exact implementation details in the `rohitg00/agentmemory` repository, showing how the codebase handles LLM failures with circuit breaker and fallback mechanisms to achieve self-healing behavior.

## Circuit Breaker Protection with ResilientProvider

Every LLM provider in AgentMemory is wrapped by a `ResilientProvider` class defined in [`src/providers/resilient.ts`](https://github.com/rohitg00/agentmemory/blob/main/src/providers/resilient.ts). This wrapper instantiates a `CircuitBreaker` object from [`src/providers/circuit-breaker.ts`](https://github.com/rohitg00/agentmemory/blob/main/src/providers/circuit-breaker.ts) that monitors failure rates and temporarily blocks requests to failing services.

The circuit breaker implements a standard state machine with three states:

- **Closed**: Normal operation where requests pass through to the underlying provider
- **Open**: All requests are immediately rejected with the error `"circuit_breaker_open"`
- **Half-Open**: After a recovery timeout, a single trial request is permitted to test if the service has recovered

According to the source code, the breaker starts in the closed state with `isAllowed === true`. Each failed request increments a failure counter. When failures reach the configurable threshold (default **3**) within a sliding **failure window** (default **60 seconds**), the breaker transitions to open state ([lines 13-21, 32-44](https://github.com/rohitg00/agentmemory/blob/main/src/providers/circuit-breaker.ts#L13-L44)).

After a **recovery timeout** (default **30 seconds**), the breaker enters half-open state. A successful trial request triggers `recordSuccess()`, which closes the circuit and resets counters. A failed trial immediately re-opens the circuit.

For monitoring and telemetry, the current breaker state—including timestamps and failure counts—is exposed via the `circuitState` property on the `ResilientProvider` instance ([lines 34-36](https://github.com/rohitg00/agentmemory/blob/main/src/providers/resilient.ts#L34-L36)).

```typescript
import { ResilientProvider } from "./providers/resilient.js";
import { OpenRouterProvider } from "./providers/openrouter.js";

const resilient = new ResilientProvider(new OpenRouterProvider(apiKey));

// Inspect circuit state for health metrics
console.log(resilient.circuitState); 
// { state: "closed", failures: 0, lastFailure: null, nextAttempt: null }

```

## Fallback Chain for Provider Failover

When a provider fails or its circuit breaker is open, AgentMemory delegates the request to a `FallbackChainProvider` defined in [`src/providers/fallback-chain.ts`](https://github.com/rohitg00/agentmemory/blob/main/src/providers/fallback-chain.ts). This chain executes providers sequentially until one succeeds.

The `tryAll` method iterates through the configured provider list, returning the first successful result. If every provider in the chain fails, it throws the last captured error ([lines 18-30](https://github.com/rohitg00/agentmemory/blob/main/src/providers/fallback-chain.ts#L18-L30)).

The chain construction reflects the provider priority order. For example, a chain configured with Anthropic, Gemini, and OpenRouter displays as `fallback(anthropic → gemini → openrouter)` during execution, as verified in the test expectations in [`test/fallback-chain.test.ts`](https://github.com/rohitg00/agentmemory/blob/main/test/fallback-chain.test.ts) ([lines 73-76](https://github.com/rohitg00/agentmemory/blob/main/test/fallback-chain.test.ts#L73-L76)).

## Graceful Degradation with NoopProvider

If no LLM API keys are configured in the environment, `detectProvider()` returns a `NoopProvider` instance from [`src/providers/noop.ts`](https://github.com/rohitg00/agentmemory/blob/main/src/providers/noop.ts). This provider implements the same interface as production providers but returns empty strings for all operations.

This ensures the system never throws unhandled exceptions when all external services are unavailable, allowing AgentMemory to degrade gracefully rather than crashing.

## Implementation Example

To utilize these resilience features, import the factory functions from [`src/providers/index.ts`](https://github.com/rohitg00/agentmemory/blob/main/src/providers/index.ts):

```typescript
// Creating a resilient provider with automatic fallback chain
import { createFallbackProvider } from "./providers/index.js";

// Builds chain from env-configured providers (ANTHROPIC, GEMINI, OPENROUTER)
const provider = createFallbackProvider(config);

try {
  const summary = await provider.summarize(systemPrompt, userPrompt);
  console.log("LLM summary:", summary);
} catch (e) {
  if (e.message === "circuit_breaker_open") {
    console.warn("Circuit open - using cached data or degrading gracefully");
  } else {
    console.error("All providers failed:", e);
  }
}

```

## Summary

- **AgentMemory** implements a two-layer resilience strategy: circuit breakers prevent hammering flaky services, while fallback chains automatically retry alternative LLM providers.
- The **`ResilientProvider`** wrapper in [`src/providers/resilient.ts`](https://github.com/rohitg00/agentmemory/blob/main/src/providers/resilient.ts) monitors failure rates and opens circuits after 3 failures within 60 seconds, blocking requests for 30 seconds before attempting recovery.
- The **`FallbackChainProvider`** in [`src/providers/fallback-chain.ts`](https://github.com/rohitg00/agentmemory/blob/main/src/providers/fallback-chain.ts) sequentially executes providers via the `tryAll` method until one succeeds.
- **`NoopProvider`** ensures graceful degradation when no LLM keys are configured, returning empty strings rather than throwing exceptions.
- Together, these mechanisms provide the **self-healing** behavior documented in the AgentMemory README, isolating LLM failures from core memory operations.

## Frequently Asked Questions

### What triggers the circuit breaker to open in AgentMemory?

The circuit breaker opens when a provider records 3 failures within a 60-second sliding window. Once open, all subsequent requests immediately fail with the error `"circuit_breaker_open"` without hitting the external API, preventing resource exhaustion and cascading failures.

### How does the fallback chain decide which provider to use next?

The `FallbackChainProvider` iterates through providers in the order defined by the fallback configuration (environment variables or defaults). It calls the `tryAll` method, which attempts each provider sequentially and returns the first successful result. If all fail, it propagates the last error captured.

### Can I monitor the circuit breaker state in real-time?

Yes. The `ResilientProvider` exposes a `circuitState` property that returns the current state (closed, open, or half-open), failure count, timestamps of last failures, and the next scheduled retry attempt. This enables integration with monitoring systems and health dashboards.

### What happens if all LLM providers are unavailable?

If every provider in the fallback chain fails, or if no API keys are configured, AgentMemory uses the `NoopProvider`, which returns empty strings for all LLM operations. This ensures the memory system remains operational during total LLM outages, though LLM-dependent features will return empty results.