architecture

How World Monitor's 4-Tier AI Fallback Chain Prioritizes Local vs Cloud Providers

March 9, 2026 koala73/worldmonitor ↗

World Monitor implements a deterministic four-tier pipeline that prefers local inference through Ollama and Browser T5 when running in beta mode, but defaults to cloud-first execution (Groq and OpenRouter) in standard operation, falling back to client-side models only when remote services fail.

The koala73/worldmonitor open-source project manages AI summarization through a sophisticated fallback system that balances privacy, latency, and availability. Understanding how this 4-tier AI fallback chain prioritizes between local and cloud providers is essential for optimizing self-hosted deployments and managing API costs.

The Four-Tier Provider Architecture

The complete fallback hierarchy consists of four distinct inference layers defined across the codebase. Three providers are declared in src/services/summarization.ts as an ordered array, while the fourth runs independently in a WebWorker.

API Provider Definitions

The API_PROVIDERS constant establishes the cloud-chain order:

const API_PROVIDERS: ApiProviderDef[] = [
  { featureId: 'aiOllama',      provider: 'ollama',     label: 'Ollama' },
  { featureId: 'aiGroq',        provider: 'groq',       label: 'Groq AI' },
  { featureId: 'aiOpenRouter',  provider: 'openrouter', label: 'OpenRouter' },
];

Ollama – A local HTTP endpoint typically available in desktop environments via /api/local-* routes, positioned as the first option in the API chain.
Groq – A fast cloud LLM service acting as the primary remote provider.
OpenRouter – A secondary cloud service providing the final API-based fallback.

The Browser T5 Local Tier

Separate from the API array, the Browser T5 model represents the fourth tier, executing entirely within the client's browser through src/services/ml-worker.ts. This tier is invoked via tryBrowserT5() and checked using mlWorker.isModelLoaded('summarization-beta'), offering inference without network requests.

Normal Mode: Cloud-First Execution

When BETA_MODE is disabled in src/config/beta.ts, the system prioritizes remote availability over local processing. The implementation executes the cloud chain before attempting any browser-based inference:

if (!options?.skipCloudProviders) {
  chainResult = await runApiChain(API_PROVIDERS, …);
}
if (chainResult) return chainResult;

if (!options?.skipBrowserFallback) {
  const browserResult = await tryBrowserT5(headlines);
  if (browserResult) return browserResult;
}

In this configuration, the Ollama → Groq → OpenRouter sequence runs first via runApiChain(). Only if all three providers fail does the system invoke tryBrowserT5() to execute the local T5 model. This approach ensures compatibility in environments where local models aren't configured or available.

Beta Mode: Local-First Prioritization

Enabling BETA_MODE fundamentally restructures the 4-tier AI fallback chain to favor privacy and offline capability. The chain now evaluates local readiness before initiating cloud requests:

Step	Provider	Execution Trigger
1	Browser T5 (Local)	Runs immediately if `mlWorker.isModelLoaded('summarization-beta')` returns `true`
2	Cloud Chain	Executed only if local inference fails or the model isn't loaded
3	Browser T5 (Final Fallback)	Runs after cloud exhaustion if the model finishes loading during API attempts

The beta implementation in src/services/summarization.ts checks model readiness first:

if (modelReady) {
  if (!options?.skipBrowserFallback) {
    const browserResult = await tryBrowserT5(headlines, 'summarization-beta');
    if (browserResult) { return browserResult; }
  }
  if (!options?.skipCloudProviders) {
    const chainResult = await runApiChain(API_PROVIDERS, …);
    if (chainResult) return chainResult;
  }
}

When the model isn't ready, the system initiates background loading while concurrently attempting the cloud chain, maximizing efficiency between model initialization and remote requests.

Configuration Options for Provider Selection

The generateSummary function accepts options to bypass automatic tier selection. These parameters override both normal and beta mode behaviors:

Force Purely Local Processing

Skip all cloud providers including Ollama:

const localOnly = await generateSummary(
  ['Headline 1', 'Headline 2'],
  undefined,
  undefined,
  'en',
  { skipCloudProviders: true }
);

Force Cloud-Only Execution

Prevent browser T5 initialization entirely:

const cloudOnly = await generateSummary(
  ['Headline 1', 'Headline 2'],
  undefined,
  undefined,
  'en',
  { skipBrowserFallback: true }
);

Runtime Detection and Analytics

The prioritization logic depends on capabilities detected in src/services/runtime.ts and worker states in src/services/ml-worker.ts. The mlWorker.isAvailable property determines WebWorker support, while isDesktopRuntime() identifies Ollama accessibility. The system tracks successful providers via trackLLMUsage() and documents failures through trackLLMFailure() in src/services/analytics.ts.

Summary

Normal mode executes the cloud chain (Ollama → Groq → OpenRouter) before falling back to the Browser T5 local model.
Beta mode checks Browser T5 readiness first, running local inference immediately if available while potentially running cloud attempts concurrently during model loading.
The API_PROVIDERS array in src/services/summarization.ts positions Ollama as the first-local-but-API-accessible tier.
Configuration options skipCloudProviders and skipBrowserFallback allow explicit control over the 4-tier AI fallback chain.
Worker readiness states in src/services/ml-worker.ts and the BETA_MODE flag in src/config/beta.ts control runtime prioritization.

Frequently Asked Questions

What determines whether the AI fallback chain uses local or cloud providers first?

The BETA_MODE flag in src/config/beta.ts controls the execution order. When disabled, runApiChain() executes with cloud providers before tryBrowserT5() is called. When enabled, the code checks mlWorker.isModelLoaded('summarization-beta') and prioritizes the Browser T5 model if ready.

Can I force World Monitor to use only local AI models without any cloud calls?

Yes. Pass { skipCloudProviders: true } to generateSummary() options. This bypasses the entire API_PROVIDERS chain including Ollama, Groq, and OpenRouter, attempting only the Browser T5 local inference via the WebWorker.

Why does Ollama appear in the cloud chain if it runs locally?

Ollama is treated as an API provider because it exposes an OpenAI-compatible HTTP endpoint accessible via /api/local-* routes in desktop runtimes. While physically local, it follows the same request/response pattern as remote services, making it a hybrid tier between fully local Browser T5 inference and external cloud providers.

How does the system handle cases where the Browser T5 model is still loading?

When modelReady is false in beta mode, the system initiates loadModel() in the background while simultaneously executing runApiChain() with cloud providers. If the cloud chain exhausts before loading completes, the system waits for the model and runs tryBrowserT5() as a final fallback.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how koala73/worldmonitor works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →