How to Configure Whisper for Greedy Decoding vs. Beam Search
To configure OpenAI Whisper for greedy decoding, set temperature=0.0 and leave beam_size=None in the DecodingOptions dataclass; for beam search, provide an integer value (e.g., 5) to beam_size and set temperature=0.0 for deterministic results.
The openai/whisper repository provides flexible decoding strategies through the DecodingOptions configuration object in whisper/decoding.py. Whether you need fast, deterministic transcription via greedy decoding or higher accuracy through beam search, understanding these configuration parameters is essential for optimizing Whisper's performance.
Understanding Whisper's Decoding Options
Whisper's decoding behavior is governed by the DecodingOptions dataclass defined in whisper/decoding.py (lines 80-94). This configuration object determines whether the model uses greedy decoding or beam search based on two critical parameters:
beam_size: Controls the number of parallel beams. When set toNone(default), Whisper uses greedy decoding. When set to an integer greater than 1, it enables beam search.temperature: Controls sampling randomness. A value of0.0forces deterministic selection of the highest-probability token, which is required for both greedy decoding and standard beam search.
Additional parameters like patience (length-penalty factor) and best_of (number of independent samples) further refine the decoding process but are mutually exclusive with standard beam search configurations.
Greedy Decoding Configuration
Greedy decoding selects the single most probable token at each timestep, making it the fastest and most deterministic option. In whisper/decoding.py (lines 45-52), Whisper instantiates a GreedyDecoder when beam_size remains None.
Python API Implementation
To configure greedy decoding programmatically, initialize DecodingOptions with temperature=0.0 and omit beam_size (or explicitly set it to None):
import whisper
model = whisper.load_model("base")
mel = whisper.load_audio("speech.wav")
# Configure for greedy decoding
options = whisper.DecodingOptions(temperature=0.0) # beam_size=None by default
result = whisper.decode(model, mel, options)
print(result.text)
Source reference: DecodingOptions definition in whisper/decoding.py (lines 80-94) and GreedyDecoder implementation (lines 72-94).
Command Line Interface
For CLI usage, greedy decoding is the default behavior. Explicitly set the temperature to ensure deterministic output:
whisper speech.wav --model base --temperature 0
Source reference: Argument parsing in whisper/transcribe.py (lines 40-45).
Beam Search Configuration
Beam search maintains multiple candidate sequences (beams) and selects the highest-scoring complete sequence, often improving accuracy over greedy decoding at the cost of increased computation. When beam_size is provided as an integer greater than 1, whisper/decoding.py instantiates a BeamSearchDecoder (lines 101-138).
Python API Implementation
To enable beam search, provide an integer value to beam_size and set temperature=0.0 for deterministic results:
import whisper
model = whisper.load_model("base")
mel = whisper.load_audio("speech.wav")
# Configure for beam search with 5 beams
options = whisper.DecodingOptions(
beam_size=5,
temperature=0.0,
patience=1.2 # Optional length penalty
)
result = whisper.decode(model, mel, options)
print(result.text)
Source reference: BeamSearchDecoder implementation in whisper/decoding.py (lines 101-138) and MaximumLikelihoodRanker for length penalty application.
Command Line Interface
Use the --beam_size flag to enable beam search from the command line:
whisper speech.wav --model base --beam_size 5 --temperature 0
Source reference: CLI argument definitions in whisper/transcribe.py (lines 40-45).
Internal Decoder Selection Logic
The transition between greedy and beam search decoding occurs in whisper/decoding.py (lines 45-52) within the DecodingTask class initialization. The logic follows a simple conditional:
# From whisper/decoding.py (lines 45-52)
if options.beam_size is not None:
self.decoder = BeamSearchDecoder(
options.beam_size, tokenizer.eot, self.inference, options.patience
)
else:
self.decoder = GreedyDecoder(options.temperature, tokenizer.eot)
When beam_size is None, the system instantiates GreedyDecoder, which implements the __call__ method to select logits.argmax() when temperature == 0 or sample from the categorical distribution when temperature is non-zero.
When beam_size is provided, the system instantiates BeamSearchDecoder, which manages multiple beam states, rearranges KV-caches for active beams, and applies length normalization through the MaximumLikelihoodRanker before returning the highest-scoring sequence.
Summary
- Greedy decoding is the default behavior when
beam_size=None, selecting the highest-probability token at each step usingGreedyDecoderinwhisper/decoding.py. - Beam search activates when
beam_sizeis set to an integer greater than 1, utilizingBeamSearchDecoderto evaluate multiple candidate sequences simultaneously. - Temperature must be set to
0.0for deterministic results in both modes; non-zero temperatures disable beam search and enable sampling. - Configuration occurs through the
DecodingOptionsdataclass in Python or via--beam_sizeand--temperatureflags in the CLI.
Frequently Asked Questions
What is the difference between greedy decoding and beam search in Whisper?
Greedy decoding selects the single most probable token at each timestep, making it faster but potentially suboptimal for complex audio. Beam search maintains multiple candidate sequences (beams) and selects the highest-scoring complete sequence, which often improves transcription accuracy at the cost of increased computation time and memory usage.
Does temperature affect beam search decoding?
Temperature does not affect beam search when properly configured. According to the fallback logic in whisper/transcribe.py (lines 90-99), any non-zero temperature value disables beam search and falls back to sampling-based decoding. For deterministic beam search, always set temperature=0.0.
How do I enable the patience factor in Whisper beam search?
The patience factor is a length-penalty parameter available in beam search mode. Set the patience parameter in DecodingOptions alongside your beam_size (e.g., DecodingOptions(beam_size=5, temperature=0.0, patience=1.2)). This value is passed to the BeamSearchDecoder constructor in whisper/decoding.py and applied through the MaximumLikelihoodRanker during final beam scoring.
Which decoding method is faster in Whisper?
Greedy decoding is significantly faster than beam search because it evaluates only a single sequence rather than maintaining and scoring multiple parallel beams. For real-time applications or processing large volumes of audio, greedy decoding with temperature=0.0 provides the best throughput, while beam search offers improved accuracy for challenging audio segments.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →