# Summarize Transcriber Options: Whisper.cpp, OpenAI, FAL, and NVIDIA ONNX Explained

> Explore Summarize CLI transcriber options: whisper.cpp, OpenAI, FAL, and NVIDIA ONNX. Learn how to choose the best local or cloud AI for your transcription needs.

- Repository: [Peter Steinberger/summarize](https://github.com/steipete/summarize)
- Tags: deep-dive
- Published: 2026-02-19

---

**The Summarize CLI supports four transcriber backends—ONNX (Parakeet/Canary), whisper.cpp, OpenAI Whisper, and FAL-AI—automatically falling back from local GPU-accelerated options to cloud APIs based on your environment configuration.**

The `steipete/summarize` repository provides a powerful command-line tool for transcribing audio and video content. Understanding the available transcriber options helps you optimize for speed, cost, and privacy when converting speech to text.

## Available Transcriber Backends in Summarize

Summarize implements four distinct transcription engines, each suited for different deployment scenarios ranging from local GPU inference to serverless cloud APIs.

### ONNX (NVIDIA Parakeet and Canary) - Local GPU Acceleration

The **ONNX backend** provides local transcription using NVIDIA's Parakeet or Canary models through the `sherpa-onnx` runtime. This option offers the fastest local processing when GPU acceleration is available.

Implementation details in [`packages/core/src/transcription/onnx-cli.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/onnx-cli.ts) handle command parsing via `resolvePreferredOnnxModel`, which checks for configured ONNX commands between lines 17-23. The system automatically downloads required model artifacts to `~/.cache/summarize/onnx` through the `ensureModelArtifactsDownloaded` function (lines 94-115).

### whisper.cpp - Local CPU/GPU Inference

**whisper.cpp** serves as the secondary local option, running OpenAI's Whisper models via the `whisper-cli` binary. This backend works on both CPU and GPU configurations without requiring NVIDIA-specific dependencies.

The readiness check occurs in `isWhisperCppReady()` within [`packages/core/src/transcription/whisper/whisper-cpp.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/whisper/whisper-cpp.ts), called from `transcribeMediaWithWhisper` in [`packages/core/src/transcription/whisper/core.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/whisper/core.ts) (lines 31-38). The default model path resolves to `~/.summarize/cache/whisper-cpp/models/ggml-base.bin` via `resolveWhisperCppModelPath` in [`src/run/transcriber-cli.ts`](https://github.com/steipete/summarize/blob/main/src/run/transcriber-cli.ts).

### OpenAI Whisper - Cloud API

The **OpenAI Whisper** backend transmits audio to OpenAI's cloud transcription endpoint (`/v1/audio/transcriptions`). This option requires no local model downloads or GPU resources but incurs API costs and transmits data externally.

Activation requires the `OPENAI_API_KEY` environment variable, with optional customization via `OPENAI_WHISPER_BASE_URL` for compatible endpoints.

### FAL-AI Whisper - Serverless Cloud Fallback

**FAL-AI** provides the final cloud fallback using serverless GPU infrastructure. This option activates when local transcribers fail and OpenAI is unavailable or returns errors.

Configuration requires the `FAL_KEY` environment variable. The selection logic in [`packages/core/src/transcription/whisper/core.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/whisper/core.ts) attempts FAL-AI only after a failed OpenAI call.

## Automatic Selection Logic

Summarize implements a deterministic fallback chain when `SUMMARIZE_TRANSCRIBER` is set to `auto` (the default):

```

ONNX (parakeet → canary) → whisper.cpp → OpenAI → FAL-AI

```

This priority order appears in the `summarize transcriber setup` command output, implemented in [`src/run/transcriber-cli.ts`](https://github.com/steipete/summarize/blob/main/src/run/transcriber-cli.ts) (lines 50-55).

The selection process works as follows:

1. **ONNX Check**: The system first calls `resolvePreferredOnnxModel` in [`packages/core/src/transcription/onnx-cli.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/onnx-cli.ts) to detect if `SUMMARIZE_ONNX_PARAKEET_CMD` or `SUMMARIZE_ONNX_CANARY_CMD` is configured.

2. **whisper.cpp Check**: If no ONNX command exists, `isWhisperCppReady()` verifies that `whisper-cli` (or the binary specified in `SUMMARIZE_WHISPER_CPP_BINARY`) is available and models are downloaded.

3. **Cloud Fallback**: Only when local options fail does the system check for `OPENAI_API_KEY`, then `FAL_KEY`.

## Configuring Local Transcribers

Local transcription keeps data private and eliminates API costs but requires proper binary and model configuration.

### Setting Up ONNX (Parakeet or Canary)

Configure the ONNX backend by defining the command template through environment variables:

```bash
export SUMMARIZE_ONNX_PARAKEET_CMD='["sherpa-onnx", "--tokens", "{vocab}", "--offline-ctc-model", "{model}", "--input-wav", "{input}"]'

```

Or for Canary:

```bash
export SUMMARIZE_ONNX_CANARY_CMD='["sherpa-onnx", "--tokens", "{vocab}", "--offline-ctc-model", "{model}", "--input-wav", "{input}"]'

```

The binary (`sherpa-onnx` in this example) must exist in your `PATH`. Summarize automatically downloads the required `model.onnx` and [`vocab.txt`](https://github.com/steipete/summarize/blob/main/vocab.txt) files from Hugging Face into `~/.cache/summarize/onnx` (or `$XDG_CACHE_HOME/summarize/onnx`) via the `ensureModelArtifactsDownloaded` function in [`packages/core/src/transcription/onnx-cli.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/onnx-cli.ts).

### Configuring whisper.cpp

The whisper.cpp backend requires the `whisper-cli` binary (or an alternative specified via `SUMMARIZE_WHISPER_CPP_BINARY`):

```bash
export SUMMARIZE_WHISPER_CPP_BINARY=/usr/local/bin/whisper-cli

```

Models download automatically to `~/.summarize/cache/whisper-cpp/models/ggml-base.bin` by default. The path resolution occurs in `resolveWhisperCppModelPath` within [`src/run/transcriber-cli.ts`](https://github.com/steipete/summarize/blob/main/src/run/transcriber-cli.ts).

## Configuring Cloud Transcribers

Cloud options require API keys but work immediately without local model downloads.

### OpenAI Whisper Setup

Set your API key to enable OpenAI transcription:

```bash
export OPENAI_API_KEY=sk-...

```

Optionally redirect to a compatible endpoint:

```bash
export OPENAI_WHISPER_BASE_URL=https://api.example.com/v1

```

### FAL-AI Setup

Configure the FAL-AI fallback:

```bash
export FAL_KEY=...

```

This activates only if OpenAI fails or is unavailable, as implemented in the selection logic within [`packages/core/src/transcription/whisper/core.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/whisper/core.ts).

## Usage Examples

### Force ONNX Parakeet via CLI

```bash
summarize "https://example.com/podcast.mp3" --transcriber parakeet

```

### Switch to whisper.cpp via Environment

```bash
export SUMMARIZE_TRANSCRIBER=whisper
export SUMMARIZE_WHISPER_CPP_BINARY=/usr/local/bin/whisper-cli
summarize "https://example.com/lecture.mp4"

```

### Use OpenAI Cloud Transcription

```bash
export OPENAI_API_KEY=sk-...
export SUMMARIZE_TRANSCRIBER=auto
summarize "https://example.com/audio.wav"

```

### Inspect Automatic Selection Order

```bash
summarize transcriber setup

```

This displays the priority chain: ONNX (parakeet then canary) → whisper.cpp → OpenAI → FAL.

## Summary

- **Four backends**: Summarize supports ONNX (Parakeet/Canary), whisper.cpp, OpenAI Whisper, and FAL-AI transcription engines.
- **Automatic fallback**: The default `auto` mode prioritizes local ONNX models, then whisper.cpp, then cloud APIs (OpenAI before FAL).
- **Configuration**: Local transcribers require binary paths via `SUMMARIZE_ONNX_PARAKEET_CMD`, `SUMMARIZE_ONNX_CANARY_CMD`, or `SUMMARIZE_WHISPER_CPP_BINARY`; cloud options need `OPENAI_API_KEY` or `FAL_KEY`.
- **Model management**: ONNX and whisper.cpp automatically download required models to `~/.cache/summarize/onnx` and `~/.summarize/cache/whisper-cpp/` respectively.

## Frequently Asked Questions

### How does Summarize choose which transcriber to use?

When `SUMMARIZE_TRANSCRIBER` is set to `auto` (the default), Summarize checks for available backends in a specific order defined in [`src/run/transcriber-cli.ts`](https://github.com/steipete/summarize/blob/main/src/run/transcriber-cli.ts). It first attempts to resolve an ONNX command via `resolvePreferredOnnxModel` in [`packages/core/src/transcription/onnx-cli.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/onnx-cli.ts), then checks for whisper.cpp readiness via `isWhisperCppReady()`, and finally falls back to cloud APIs (OpenAI, then FAL) if local options fail or are unconfigured.

### What is the difference between Parakeet and Canary transcriber options?

Both are ONNX-based local transcription models using the NVIDIA NeMo framework. **Parakeet** and **Canary** represent different model architectures optimized for specific use cases, configured via `SUMMARIZE_ONNX_PARAKEET_CMD` or `SUMMARIZE_ONNX_CANARY_CMD` respectively. When `auto` mode is enabled, Summarize checks for Parakeet configuration first, then Canary, before falling back to whisper.cpp.

### Can I use a custom whisper.cpp binary or model path?

Yes. While Summarize defaults to `whisper-cli` in your PATH and downloads models to `~/.summarize/cache/whisper-cpp/models/ggml-base.bin`, you can override these locations. Set `SUMMARIZE_WHISPER_CPP_BINARY` to point to your custom binary (e.g., `/usr/local/bin/whisper-cli`), and the system will use `resolveWhisperCppModelPath` from [`src/run/transcriber-cli.ts`](https://github.com/steipete/summarize/blob/main/src/run/transcriber-cli.ts) to locate compatible model files.

### Do I need to manually download models for local transcription?

No. Both ONNX and whisper.cpp backends handle model downloads automatically. For ONNX, the `ensureModelArtifactsDownloaded` function in [`packages/core/src/transcription/onnx-cli.ts`](https://github.com/steipete/summarize/blob/main/packages/core/src/transcription/onnx-cli.ts) downloads `model.onnx` and [`vocab.txt`](https://github.com/steipete/summarize/blob/main/vocab.txt) from Hugging Face into `$XDG_CACHE_HOME/summarize/onnx` (or `~/.cache/summarize/onnx`). For whisper.cpp, models download to `~/.summarize/cache/whisper-cpp/models/` when you first run transcription.