How to Configure Whisper for Greedy Decoding vs. Beam Search

To configure OpenAI Whisper for greedy decoding, set temperature=0.0 and leave beam_size=None in the DecodingOptions dataclass; for beam search, provide an integer value (e.g., 5) to beam_size and set temperature=0.0 for deterministic results.

The openai/whisper repository provides flexible decoding strategies through the DecodingOptions configuration object in whisper/decoding.py. Whether you need fast, deterministic transcription via greedy decoding or higher accuracy through beam search, understanding these configuration parameters is essential for optimizing Whisper's performance.

Understanding Whisper's Decoding Options

Whisper's decoding behavior is governed by the DecodingOptions dataclass defined in whisper/decoding.py (lines 80-94). This configuration object determines whether the model uses greedy decoding or beam search based on two critical parameters:

  • beam_size: Controls the number of parallel beams. When set to None (default), Whisper uses greedy decoding. When set to an integer greater than 1, it enables beam search.
  • temperature: Controls sampling randomness. A value of 0.0 forces deterministic selection of the highest-probability token, which is required for both greedy decoding and standard beam search.

Additional parameters like patience (length-penalty factor) and best_of (number of independent samples) further refine the decoding process but are mutually exclusive with standard beam search configurations.

Greedy Decoding Configuration

Greedy decoding selects the single most probable token at each timestep, making it the fastest and most deterministic option. In whisper/decoding.py (lines 45-52), Whisper instantiates a GreedyDecoder when beam_size remains None.

Python API Implementation

To configure greedy decoding programmatically, initialize DecodingOptions with temperature=0.0 and omit beam_size (or explicitly set it to None):

import whisper

model = whisper.load_model("base")
mel = whisper.load_audio("speech.wav")

# Configure for greedy decoding

options = whisper.DecodingOptions(temperature=0.0)  # beam_size=None by default

result = whisper.decode(model, mel, options)

print(result.text)

Source reference: DecodingOptions definition in whisper/decoding.py (lines 80-94) and GreedyDecoder implementation (lines 72-94).

Command Line Interface

For CLI usage, greedy decoding is the default behavior. Explicitly set the temperature to ensure deterministic output:

whisper speech.wav --model base --temperature 0

Source reference: Argument parsing in whisper/transcribe.py (lines 40-45).

Beam Search Configuration

Beam search maintains multiple candidate sequences (beams) and selects the highest-scoring complete sequence, often improving accuracy over greedy decoding at the cost of increased computation. When beam_size is provided as an integer greater than 1, whisper/decoding.py instantiates a BeamSearchDecoder (lines 101-138).

Python API Implementation

To enable beam search, provide an integer value to beam_size and set temperature=0.0 for deterministic results:

import whisper

model = whisper.load_model("base")
mel = whisper.load_audio("speech.wav")

# Configure for beam search with 5 beams

options = whisper.DecodingOptions(
    beam_size=5,
    temperature=0.0,
    patience=1.2  # Optional length penalty

)
result = whisper.decode(model, mel, options)

print(result.text)

Source reference: BeamSearchDecoder implementation in whisper/decoding.py (lines 101-138) and MaximumLikelihoodRanker for length penalty application.

Command Line Interface

Use the --beam_size flag to enable beam search from the command line:

whisper speech.wav --model base --beam_size 5 --temperature 0

Source reference: CLI argument definitions in whisper/transcribe.py (lines 40-45).

Internal Decoder Selection Logic

The transition between greedy and beam search decoding occurs in whisper/decoding.py (lines 45-52) within the DecodingTask class initialization. The logic follows a simple conditional:


# From whisper/decoding.py (lines 45-52)

if options.beam_size is not None:
    self.decoder = BeamSearchDecoder(
        options.beam_size, tokenizer.eot, self.inference, options.patience
    )
else:
    self.decoder = GreedyDecoder(options.temperature, tokenizer.eot)

When beam_size is None, the system instantiates GreedyDecoder, which implements the __call__ method to select logits.argmax() when temperature == 0 or sample from the categorical distribution when temperature is non-zero.

When beam_size is provided, the system instantiates BeamSearchDecoder, which manages multiple beam states, rearranges KV-caches for active beams, and applies length normalization through the MaximumLikelihoodRanker before returning the highest-scoring sequence.

Summary

  • Greedy decoding is the default behavior when beam_size=None, selecting the highest-probability token at each step using GreedyDecoder in whisper/decoding.py.
  • Beam search activates when beam_size is set to an integer greater than 1, utilizing BeamSearchDecoder to evaluate multiple candidate sequences simultaneously.
  • Temperature must be set to 0.0 for deterministic results in both modes; non-zero temperatures disable beam search and enable sampling.
  • Configuration occurs through the DecodingOptions dataclass in Python or via --beam_size and --temperature flags in the CLI.

Frequently Asked Questions

What is the difference between greedy decoding and beam search in Whisper?

Greedy decoding selects the single most probable token at each timestep, making it faster but potentially suboptimal for complex audio. Beam search maintains multiple candidate sequences (beams) and selects the highest-scoring complete sequence, which often improves transcription accuracy at the cost of increased computation time and memory usage.

Does temperature affect beam search decoding?

Temperature does not affect beam search when properly configured. According to the fallback logic in whisper/transcribe.py (lines 90-99), any non-zero temperature value disables beam search and falls back to sampling-based decoding. For deterministic beam search, always set temperature=0.0.

The patience factor is a length-penalty parameter available in beam search mode. Set the patience parameter in DecodingOptions alongside your beam_size (e.g., DecodingOptions(beam_size=5, temperature=0.0, patience=1.2)). This value is passed to the BeamSearchDecoder constructor in whisper/decoding.py and applied through the MaximumLikelihoodRanker during final beam scoring.

Which decoding method is faster in Whisper?

Greedy decoding is significantly faster than beam search because it evaluates only a single sequence rather than maintaining and scoring multiple parallel beams. For real-time applications or processing large volumes of audio, greedy decoding with temperature=0.0 provides the best throughput, while beam search offers improved accuracy for challenging audio segments.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →