# How to Configure Whisper for Greedy Decoding vs. Beam Search

> **To configure OpenAI Whisper for greedy decoding, set `temperature=0.0` and leave `beam_size=None` in the `DecodingOptions` dataclass; for beam search, provide an integer value (e.g., `5`) to `beam_size` and set `temperature=0...

- Repository: [OpenAI/whisper](https://github.com/openai/whisper)
- Tags: 
- Published: 2026-02-27

---

**To configure OpenAI Whisper for greedy decoding, set `temperature=0.0` and leave `beam_size=None` in the `DecodingOptions` dataclass; for beam search, provide an integer value (e.g., `5`) to `beam_size` and set `temperature=0.0` for deterministic results.**

The `openai/whisper` repository provides flexible decoding strategies through the `DecodingOptions` configuration object in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py). Whether you need fast, deterministic transcription via greedy decoding or higher accuracy through beam search, understanding these configuration parameters is essential for optimizing Whisper's performance.

## Understanding Whisper's Decoding Options

Whisper's decoding behavior is governed by the **`DecodingOptions`** dataclass defined in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) (lines 80-94). This configuration object determines whether the model uses greedy decoding or beam search based on two critical parameters:

- **`beam_size`**: Controls the number of parallel beams. When set to `None` (default), Whisper uses greedy decoding. When set to an integer greater than 1, it enables beam search.
- **`temperature`**: Controls sampling randomness. A value of `0.0` forces deterministic selection of the highest-probability token, which is required for both greedy decoding and standard beam search.

Additional parameters like **`patience`** (length-penalty factor) and **`best_of`** (number of independent samples) further refine the decoding process but are mutually exclusive with standard beam search configurations.

## Greedy Decoding Configuration

Greedy decoding selects the single most probable token at each timestep, making it the fastest and most deterministic option. In [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) (lines 45-52), Whisper instantiates a **`GreedyDecoder`** when `beam_size` remains `None`.

### Python API Implementation

To configure greedy decoding programmatically, initialize `DecodingOptions` with `temperature=0.0` and omit `beam_size` (or explicitly set it to `None`):

```python
import whisper

model = whisper.load_model("base")
mel = whisper.load_audio("speech.wav")

# Configure for greedy decoding

options = whisper.DecodingOptions(temperature=0.0)  # beam_size=None by default

result = whisper.decode(model, mel, options)

print(result.text)

```

*Source reference:* `DecodingOptions` definition in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) (lines 80-94) and `GreedyDecoder` implementation (lines 72-94).

### Command Line Interface

For CLI usage, greedy decoding is the default behavior. Explicitly set the temperature to ensure deterministic output:

```bash
whisper speech.wav --model base --temperature 0

```

*Source reference:* Argument parsing in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) (lines 40-45).

## Beam Search Configuration

Beam search maintains multiple candidate sequences (beams) and selects the highest-scoring complete sequence, often improving accuracy over greedy decoding at the cost of increased computation. When `beam_size` is provided as an integer greater than 1, [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) instantiates a **`BeamSearchDecoder`** (lines 101-138).

### Python API Implementation

To enable beam search, provide an integer value to `beam_size` and set `temperature=0.0` for deterministic results:

```python
import whisper

model = whisper.load_model("base")
mel = whisper.load_audio("speech.wav")

# Configure for beam search with 5 beams

options = whisper.DecodingOptions(
    beam_size=5,
    temperature=0.0,
    patience=1.2  # Optional length penalty

)
result = whisper.decode(model, mel, options)

print(result.text)

```

*Source reference:* `BeamSearchDecoder` implementation in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) (lines 101-138) and `MaximumLikelihoodRanker` for length penalty application.

### Command Line Interface

Use the `--beam_size` flag to enable beam search from the command line:

```bash
whisper speech.wav --model base --beam_size 5 --temperature 0

```

*Source reference:* CLI argument definitions in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) (lines 40-45).

## Internal Decoder Selection Logic

The transition between greedy and beam search decoding occurs in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) (lines 45-52) within the `DecodingTask` class initialization. The logic follows a simple conditional:

```python

# From whisper/decoding.py (lines 45-52)

if options.beam_size is not None:
    self.decoder = BeamSearchDecoder(
        options.beam_size, tokenizer.eot, self.inference, options.patience
    )
else:
    self.decoder = GreedyDecoder(options.temperature, tokenizer.eot)

```

When `beam_size` is `None`, the system instantiates **`GreedyDecoder`**, which implements the `__call__` method to select `logits.argmax()` when `temperature == 0` or sample from the categorical distribution when temperature is non-zero.

When `beam_size` is provided, the system instantiates **`BeamSearchDecoder`**, which manages multiple beam states, rearranges KV-caches for active beams, and applies length normalization through the `MaximumLikelihoodRanker` before returning the highest-scoring sequence.

## Summary

- **Greedy decoding** is the default behavior when `beam_size=None`, selecting the highest-probability token at each step using `GreedyDecoder` in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py).
- **Beam search** activates when `beam_size` is set to an integer greater than 1, utilizing `BeamSearchDecoder` to evaluate multiple candidate sequences simultaneously.
- **Temperature** must be set to `0.0` for deterministic results in both modes; non-zero temperatures disable beam search and enable sampling.
- **Configuration** occurs through the `DecodingOptions` dataclass in Python or via `--beam_size` and `--temperature` flags in the CLI.

## Frequently Asked Questions

### What is the difference between greedy decoding and beam search in Whisper?

Greedy decoding selects the single most probable token at each timestep, making it faster but potentially suboptimal for complex audio. Beam search maintains multiple candidate sequences (beams) and selects the highest-scoring complete sequence, which often improves transcription accuracy at the cost of increased computation time and memory usage.

### Does temperature affect beam search decoding?

Temperature does not affect beam search when properly configured. According to the fallback logic in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) (lines 90-99), any non-zero temperature value disables beam search and falls back to sampling-based decoding. For deterministic beam search, always set `temperature=0.0`.

### How do I enable the patience factor in Whisper beam search?

The patience factor is a length-penalty parameter available in beam search mode. Set the `patience` parameter in `DecodingOptions` alongside your `beam_size` (e.g., `DecodingOptions(beam_size=5, temperature=0.0, patience=1.2)`). This value is passed to the `BeamSearchDecoder` constructor in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) and applied through the `MaximumLikelihoodRanker` during final beam scoring.

### Which decoding method is faster in Whisper?

Greedy decoding is significantly faster than beam search because it evaluates only a single sequence rather than maintaining and scoring multiple parallel beams. For real-time applications or processing large volumes of audio, greedy decoding with `temperature=0.0` provides the best throughput, while beam search offers improved accuracy for challenging audio segments.