deep-dive

Summarize Transcriber Options: Whisper.cpp, OpenAI, FAL, and NVIDIA ONNX Explained

February 19, 2026 steipete/summarize ↗

The Summarize CLI supports four transcriber backends—ONNX (Parakeet/Canary), whisper.cpp, OpenAI Whisper, and FAL-AI—automatically falling back from local GPU-accelerated options to cloud APIs based on your environment configuration.

The steipete/summarize repository provides a powerful command-line tool for transcribing audio and video content. Understanding the available transcriber options helps you optimize for speed, cost, and privacy when converting speech to text.

Available Transcriber Backends in Summarize

Summarize implements four distinct transcription engines, each suited for different deployment scenarios ranging from local GPU inference to serverless cloud APIs.

ONNX (NVIDIA Parakeet and Canary) - Local GPU Acceleration

The ONNX backend provides local transcription using NVIDIA's Parakeet or Canary models through the sherpa-onnx runtime. This option offers the fastest local processing when GPU acceleration is available.

Implementation details in packages/core/src/transcription/onnx-cli.ts handle command parsing via resolvePreferredOnnxModel, which checks for configured ONNX commands between lines 17-23. The system automatically downloads required model artifacts to ~/.cache/summarize/onnx through the ensureModelArtifactsDownloaded function (lines 94-115).

whisper.cpp - Local CPU/GPU Inference

whisper.cpp serves as the secondary local option, running OpenAI's Whisper models via the whisper-cli binary. This backend works on both CPU and GPU configurations without requiring NVIDIA-specific dependencies.

The readiness check occurs in isWhisperCppReady() within packages/core/src/transcription/whisper/whisper-cpp.ts, called from transcribeMediaWithWhisper in packages/core/src/transcription/whisper/core.ts (lines 31-38). The default model path resolves to ~/.summarize/cache/whisper-cpp/models/ggml-base.bin via resolveWhisperCppModelPath in src/run/transcriber-cli.ts.

OpenAI Whisper - Cloud API

The OpenAI Whisper backend transmits audio to OpenAI's cloud transcription endpoint (/v1/audio/transcriptions). This option requires no local model downloads or GPU resources but incurs API costs and transmits data externally.

Activation requires the OPENAI_API_KEY environment variable, with optional customization via OPENAI_WHISPER_BASE_URL for compatible endpoints.

FAL-AI Whisper - Serverless Cloud Fallback

FAL-AI provides the final cloud fallback using serverless GPU infrastructure. This option activates when local transcribers fail and OpenAI is unavailable or returns errors.

Configuration requires the FAL_KEY environment variable. The selection logic in packages/core/src/transcription/whisper/core.ts attempts FAL-AI only after a failed OpenAI call.

Automatic Selection Logic

Summarize implements a deterministic fallback chain when SUMMARIZE_TRANSCRIBER is set to auto (the default):


ONNX (parakeet → canary) → whisper.cpp → OpenAI → FAL-AI

This priority order appears in the summarize transcriber setup command output, implemented in src/run/transcriber-cli.ts (lines 50-55).

The selection process works as follows:

ONNX Check: The system first calls resolvePreferredOnnxModel in packages/core/src/transcription/onnx-cli.ts to detect if SUMMARIZE_ONNX_PARAKEET_CMD or SUMMARIZE_ONNX_CANARY_CMD is configured.
whisper.cpp Check: If no ONNX command exists, isWhisperCppReady() verifies that whisper-cli (or the binary specified in SUMMARIZE_WHISPER_CPP_BINARY) is available and models are downloaded.
Cloud Fallback: Only when local options fail does the system check for OPENAI_API_KEY, then FAL_KEY.

Configuring Local Transcribers

Local transcription keeps data private and eliminates API costs but requires proper binary and model configuration.

Setting Up ONNX (Parakeet or Canary)

Configure the ONNX backend by defining the command template through environment variables:

export SUMMARIZE_ONNX_PARAKEET_CMD='["sherpa-onnx", "--tokens", "{vocab}", "--offline-ctc-model", "{model}", "--input-wav", "{input}"]'

Or for Canary:

export SUMMARIZE_ONNX_CANARY_CMD='["sherpa-onnx", "--tokens", "{vocab}", "--offline-ctc-model", "{model}", "--input-wav", "{input}"]'

The binary (sherpa-onnx in this example) must exist in your PATH. Summarize automatically downloads the required model.onnx and vocab.txt files from Hugging Face into ~/.cache/summarize/onnx (or $XDG_CACHE_HOME/summarize/onnx) via the ensureModelArtifactsDownloaded function in packages/core/src/transcription/onnx-cli.ts.

Configuring whisper.cpp

The whisper.cpp backend requires the whisper-cli binary (or an alternative specified via SUMMARIZE_WHISPER_CPP_BINARY):

export SUMMARIZE_WHISPER_CPP_BINARY=/usr/local/bin/whisper-cli

Models download automatically to ~/.summarize/cache/whisper-cpp/models/ggml-base.bin by default. The path resolution occurs in resolveWhisperCppModelPath within src/run/transcriber-cli.ts.

Configuring Cloud Transcribers

Cloud options require API keys but work immediately without local model downloads.

OpenAI Whisper Setup

Set your API key to enable OpenAI transcription:

export OPENAI_API_KEY=sk-...

Optionally redirect to a compatible endpoint:

export OPENAI_WHISPER_BASE_URL=https://api.example.com/v1

FAL-AI Setup

Configure the FAL-AI fallback:

export FAL_KEY=...

This activates only if OpenAI fails or is unavailable, as implemented in the selection logic within packages/core/src/transcription/whisper/core.ts.

Usage Examples

Force ONNX Parakeet via CLI

summarize "https://example.com/podcast.mp3" --transcriber parakeet

Switch to whisper.cpp via Environment

export SUMMARIZE_TRANSCRIBER=whisper
export SUMMARIZE_WHISPER_CPP_BINARY=/usr/local/bin/whisper-cli
summarize "https://example.com/lecture.mp4"

Use OpenAI Cloud Transcription

export OPENAI_API_KEY=sk-...
export SUMMARIZE_TRANSCRIBER=auto
summarize "https://example.com/audio.wav"

Inspect Automatic Selection Order

summarize transcriber setup

This displays the priority chain: ONNX (parakeet then canary) → whisper.cpp → OpenAI → FAL.

Summary

Four backends: Summarize supports ONNX (Parakeet/Canary), whisper.cpp, OpenAI Whisper, and FAL-AI transcription engines.
Automatic fallback: The default auto mode prioritizes local ONNX models, then whisper.cpp, then cloud APIs (OpenAI before FAL).
Configuration: Local transcribers require binary paths via SUMMARIZE_ONNX_PARAKEET_CMD, SUMMARIZE_ONNX_CANARY_CMD, or SUMMARIZE_WHISPER_CPP_BINARY; cloud options need OPENAI_API_KEY or FAL_KEY.
Model management: ONNX and whisper.cpp automatically download required models to ~/.cache/summarize/onnx and ~/.summarize/cache/whisper-cpp/ respectively.

Frequently Asked Questions

How does Summarize choose which transcriber to use?

When SUMMARIZE_TRANSCRIBER is set to auto (the default), Summarize checks for available backends in a specific order defined in src/run/transcriber-cli.ts. It first attempts to resolve an ONNX command via resolvePreferredOnnxModel in packages/core/src/transcription/onnx-cli.ts, then checks for whisper.cpp readiness via isWhisperCppReady(), and finally falls back to cloud APIs (OpenAI, then FAL) if local options fail or are unconfigured.

What is the difference between Parakeet and Canary transcriber options?

Both are ONNX-based local transcription models using the NVIDIA NeMo framework. Parakeet and Canary represent different model architectures optimized for specific use cases, configured via SUMMARIZE_ONNX_PARAKEET_CMD or SUMMARIZE_ONNX_CANARY_CMD respectively. When auto mode is enabled, Summarize checks for Parakeet configuration first, then Canary, before falling back to whisper.cpp.

Can I use a custom whisper.cpp binary or model path?

Yes. While Summarize defaults to whisper-cli in your PATH and downloads models to ~/.summarize/cache/whisper-cpp/models/ggml-base.bin, you can override these locations. Set SUMMARIZE_WHISPER_CPP_BINARY to point to your custom binary (e.g., /usr/local/bin/whisper-cli), and the system will use resolveWhisperCppModelPath from src/run/transcriber-cli.ts to locate compatible model files.

Do I need to manually download models for local transcription?

No. Both ONNX and whisper.cpp backends handle model downloads automatically. For ONNX, the ensureModelArtifactsDownloaded function in packages/core/src/transcription/onnx-cli.ts downloads model.onnx and vocab.txt from Hugging Face into $XDG_CACHE_HOME/summarize/onnx (or ~/.cache/summarize/onnx). For whisper.cpp, models download to ~/.summarize/cache/whisper-cpp/models/ when you first run transcription.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how steipete/summarize works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →