# How World Monitor's 4-Tier AI Fallback Chain Prioritizes Local vs Cloud Providers

> Discover World Monitor's 4-tier AI fallback chain. Learn how it intelligently prioritizes local inference or cloud providers for optimal performance and reliability.

- Repository: [Elie Habib/worldmonitor](https://github.com/koala73/worldmonitor)
- Tags: architecture
- Published: 2026-03-09

---

**World Monitor implements a deterministic four-tier pipeline that prefers local inference through Ollama and Browser T5 when running in beta mode, but defaults to cloud-first execution (Groq and OpenRouter) in standard operation, falling back to client-side models only when remote services fail.**

The **koala73/worldmonitor** open-source project manages AI summarization through a sophisticated fallback system that balances privacy, latency, and availability. Understanding how this **4-tier AI fallback chain** prioritizes between local and cloud providers is essential for optimizing self-hosted deployments and managing API costs.

## The Four-Tier Provider Architecture

The complete fallback hierarchy consists of four distinct inference layers defined across the codebase. Three providers are declared in [`src/services/summarization.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/summarization.ts) as an ordered array, while the fourth runs independently in a WebWorker.

### API Provider Definitions

The `API_PROVIDERS` constant establishes the cloud-chain order:

```typescript
const API_PROVIDERS: ApiProviderDef[] = [
  { featureId: 'aiOllama',      provider: 'ollama',     label: 'Ollama' },
  { featureId: 'aiGroq',        provider: 'groq',       label: 'Groq AI' },
  { featureId: 'aiOpenRouter',  provider: 'openrouter', label: 'OpenRouter' },
];

```

- **Ollama** – A local HTTP endpoint typically available in desktop environments via `/api/local-*` routes, positioned as the first option in the API chain.
- **Groq** – A fast cloud LLM service acting as the primary remote provider.
- **OpenRouter** – A secondary cloud service providing the final API-based fallback.

### The Browser T5 Local Tier

Separate from the API array, the **Browser T5** model represents the fourth tier, executing entirely within the client's browser through [`src/services/ml-worker.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/ml-worker.ts). This tier is invoked via `tryBrowserT5()` and checked using `mlWorker.isModelLoaded('summarization-beta')`, offering inference without network requests.

## Normal Mode: Cloud-First Execution

When `BETA_MODE` is disabled in [`src/config/beta.ts`](https://github.com/koala73/worldmonitor/blob/main/src/config/beta.ts), the system prioritizes remote availability over local processing. The implementation executes the cloud chain before attempting any browser-based inference:

```typescript
if (!options?.skipCloudProviders) {
  chainResult = await runApiChain(API_PROVIDERS, …);
}
if (chainResult) return chainResult;

if (!options?.skipBrowserFallback) {
  const browserResult = await tryBrowserT5(headlines);
  if (browserResult) return browserResult;
}

```

In this configuration, the **Ollama → Groq → OpenRouter** sequence runs first via `runApiChain()`. Only if all three providers fail does the system invoke `tryBrowserT5()` to execute the local T5 model. This approach ensures compatibility in environments where local models aren't configured or available.

## Beta Mode: Local-First Prioritization

Enabling `BETA_MODE` fundamentally restructures the **4-tier AI fallback chain** to favor privacy and offline capability. The chain now evaluates local readiness before initiating cloud requests:

| Step | Provider | Execution Trigger |
|------|----------|-------------------|
| 1 | **Browser T5** (Local) | Runs immediately if `mlWorker.isModelLoaded('summarization-beta')` returns `true` |
| 2 | **Cloud Chain** | Executed only if local inference fails or the model isn't loaded |
| 3 | **Browser T5** (Final Fallback) | Runs after cloud exhaustion if the model finishes loading during API attempts |

The beta implementation in [`src/services/summarization.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/summarization.ts) checks model readiness first:

```typescript
if (modelReady) {
  if (!options?.skipBrowserFallback) {
    const browserResult = await tryBrowserT5(headlines, 'summarization-beta');
    if (browserResult) { return browserResult; }
  }
  if (!options?.skipCloudProviders) {
    const chainResult = await runApiChain(API_PROVIDERS, …);
    if (chainResult) return chainResult;
  }
}

```

When the model isn't ready, the system initiates background loading while concurrently attempting the cloud chain, maximizing efficiency between model initialization and remote requests.

## Configuration Options for Provider Selection

The `generateSummary` function accepts options to bypass automatic tier selection. These parameters override both normal and beta mode behaviors:

**Force Purely Local Processing**

Skip all cloud providers including Ollama:

```typescript
const localOnly = await generateSummary(
  ['Headline 1', 'Headline 2'],
  undefined,
  undefined,
  'en',
  { skipCloudProviders: true }
);

```

**Force Cloud-Only Execution**

Prevent browser T5 initialization entirely:

```typescript
const cloudOnly = await generateSummary(
  ['Headline 1', 'Headline 2'],
  undefined,
  undefined,
  'en',
  { skipBrowserFallback: true }
);

```

## Runtime Detection and Analytics

The prioritization logic depends on capabilities detected in [`src/services/runtime.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/runtime.ts) and worker states in [`src/services/ml-worker.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/ml-worker.ts). The `mlWorker.isAvailable` property determines WebWorker support, while `isDesktopRuntime()` identifies Ollama accessibility. The system tracks successful providers via `trackLLMUsage()` and documents failures through `trackLLMFailure()` in [`src/services/analytics.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/analytics.ts).

## Summary

- **Normal mode** executes the cloud chain (**Ollama → Groq → OpenRouter**) before falling back to the **Browser T5** local model.
- **Beta mode** checks **Browser T5** readiness first, running local inference immediately if available while potentially running cloud attempts concurrently during model loading.
- The **`API_PROVIDERS`** array in [`src/services/summarization.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/summarization.ts) positions Ollama as the first-local-but-API-accessible tier.
- Configuration options **`skipCloudProviders`** and **`skipBrowserFallback`** allow explicit control over the **4-tier AI fallback chain**.
- Worker readiness states in **[`src/services/ml-worker.ts`](https://github.com/koala73/worldmonitor/blob/main/src/services/ml-worker.ts)** and the **`BETA_MODE`** flag in [`src/config/beta.ts`](https://github.com/koala73/worldmonitor/blob/main/src/config/beta.ts) control runtime prioritization.

## Frequently Asked Questions

### What determines whether the AI fallback chain uses local or cloud providers first?

The `BETA_MODE` flag in [`src/config/beta.ts`](https://github.com/koala73/worldmonitor/blob/main/src/config/beta.ts) controls the execution order. When disabled, `runApiChain()` executes with cloud providers before `tryBrowserT5()` is called. When enabled, the code checks `mlWorker.isModelLoaded('summarization-beta')` and prioritizes the Browser T5 model if ready.

### Can I force World Monitor to use only local AI models without any cloud calls?

Yes. Pass `{ skipCloudProviders: true }` to `generateSummary()` options. This bypasses the entire `API_PROVIDERS` chain including Ollama, Groq, and OpenRouter, attempting only the Browser T5 local inference via the WebWorker.

### Why does Ollama appear in the cloud chain if it runs locally?

Ollama is treated as an API provider because it exposes an OpenAI-compatible HTTP endpoint accessible via `/api/local-*` routes in desktop runtimes. While physically local, it follows the same request/response pattern as remote services, making it a hybrid tier between fully local Browser T5 inference and external cloud providers.

### How does the system handle cases where the Browser T5 model is still loading?

When `modelReady` is false in beta mode, the system initiates `loadModel()` in the background while simultaneously executing `runApiChain()` with cloud providers. If the cloud chain exhausts before loading completes, the system waits for the model and runs `tryBrowserT5()` as a final fallback.