# Whisper DecodingOptions Sampling Parameters: A Complete Guide to Stochastic Generation

> Master Whisper decoding with our guide to sampling parameters. Learn to control randomness using temperature, best_of, patience, and length_penalty for optimal AI generation.

- Repository: [OpenAI/whisper](https://github.com/openai/whisper)
- Tags: deep-dive
- Published: 2026-02-27

---

**The key sampling parameters in `whisper.DecodingOptions` include `temperature` for controlling randomness, `best_of` for candidate selection, `patience` and `length_penalty` for beam management, and fallback thresholds like `compression_ratio_threshold` and `logprob_threshold` for quality control.**

OpenAI Whisper uses the `DecodingOptions` dataclass in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) to centralize every knob that controls how the model generates text from encoded audio. When operating in sampling mode—stochastic generation rather than deterministic beam search—these parameters determine the trade-off between transcription reliability and creative diversity.

## Core Sampling Parameters in DecodingOptions

### Temperature and Stochastic Control

The **`temperature`** parameter scales logits before the softmax operation. Lower values sharpen the probability distribution, making the model more deterministic, while higher values inject randomness.

- **0.0** → Greedy decoding (most deterministic)
- **0.7-1.0** → Moderate diversity
- **>1.0** → High randomness, more varied outputs

In [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py), this directly affects the token sampling probability distribution during each generation step.

### Best-of Sampling and Candidate Selection

The **`best_of`** parameter works in conjunction with `temperature` to draw multiple independent samples. The decoder generates `best_of` candidate transcriptions and returns the one with the highest average log-probability.

This trades compute for quality:

```python
import whisper

model = whisper.load_model("base")
audio = whisper.load_audio("audio.wav")
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# Draw 5 samples and pick the best

options = whisper.DecodingOptions(
    temperature=0.8,
    best_of=5
)
result = model.decode(mel, options)
print(result.text)

```

### Patience and Length Penalty

The **`patience`** parameter controls early-stopping behavior during beam-search-style sampling. Values greater than `1.0` relax beam pruning, encouraging exploration by keeping more candidates alive longer.

The **`length_penalty`** applies a multiplicative adjustment `(sequence_length) ** (-length_penalty)` to longer hypotheses:

- **Positive values** discourage overly long transcriptions
- **Negative values** encourage longer, more detailed outputs
- **0.0** disables length normalization

## Fallback and Quality Control Parameters

### Compression Ratio and Log Probability Thresholds

Whisper implements automatic fallback mechanisms when initial sampling produces low-quality candidates. The **`temperature_increment_on_fallback`** triggers when the decoder needs to retry, automatically increasing temperature to introduce randomness and escape local optima.

The **`compression_ratio_threshold`** compares the gzip compression ratio of generated text against this limit. Exceeding the threshold indicates repetitive output, triggering a retry with higher temperature.

The **`logprob_threshold`** sets the minimum average log-probability per token. Hypotheses scoring below this value are rejected, prompting the decoder to sample again with increased temperature.

```python

# Configure fallback behavior

options = whisper.DecodingOptions(
    temperature=0.0,  # Start greedy

    temperature_increment_on_fallback=0.2,
    compression_ratio_threshold=2.4,
    logprob_threshold=-1.0,
    patience=1.5
)

```

### No-Speech Detection

The **`no_speech_threshold`** enables silence detection during sampling. When the average log-probability of the `<|nospeech|>` token exceeds this threshold, the decoder returns an empty transcription. This prevents hallucinated text during silent audio segments.

### Token Suppression

The **`suppress_tokens`** parameter accepts a list of token IDs that are forcibly set to negative infinity probability during sampling. This guarantees specific tokens never appear in the output. Common use cases include suppressing specific punctuation or formatting tokens.

According to the Whisper source code in [`whisper/tokenizer.py`](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py), you can use predefined constants like `suppress_token_id` or custom token IDs:

```python

# Suppress specific tokens

tokenizer = whisper.tokenizer.get_tokenizer(multilingual=True)
options = whisper.DecodingOptions(
    temperature=0.7,
    suppress_tokens=[tokenizer.suppress_token_id, 50363]  # Block specific tokens

)

```

## Practical Implementation Examples

When working with the high-level API in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py), you can pass `DecodingOptions` parameters directly to `model.transcribe()`:

```python
import whisper

model = whisper.load_model("base")

# High-quality sampling configuration

result = model.transcribe(
    "audio.wav",
    temperature=0.8,
    best_of=5,
    patience=1.2,
    length_penalty=0.2,
    compression_ratio_threshold=2.4,
    logprob_threshold=-1.0,
    no_speech_threshold=0.6
)
print(result["text"])

```

For direct decoder access as implemented in [`whisper/model.py`](https://github.com/openai/whisper/blob/main/whisper/model.py), instantiate `DecodingOptions` explicitly:

```python
audio = whisper.load_audio("audio.wav")
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# Conservative sampling for accurate transcription

options = whisper.DecodingOptions(
    temperature=0.0,  # Greedy

    patience=1.0,
    suppress_tokens=[]
)

result = model.decode(mel, options)
print(result.text)

```

## Summary

- **`temperature`** controls randomness: lower values produce deterministic output, higher values increase diversity.
- **`best_of`** enables multiple sampling runs, returning the highest probability candidate.
- **`patience`** and **`length_penalty`** manage beam search behavior and sequence length preferences.
- **Fallback parameters** (`temperature_increment_on_fallback`, `compression_ratio_threshold`, `logprob_threshold`) automatically retry low-quality generations with increased randomness.
- **`no_speech_threshold`** detects silent audio segments to prevent hallucinations.
- **`suppress_tokens`** blocks specific token IDs from appearing in the output.

## Frequently Asked Questions

### What is the difference between temperature and best_of in Whisper sampling?

**Temperature** scales the logits before softmax to control randomness within a single generation pass, while **best_of** runs multiple independent sampling passes and selects the candidate with the highest average log-probability. You can combine them by setting `temperature=0.8` and `best_of=5` to generate diverse candidates and keep the best one.

### How does the compression_ratio_threshold prevent repetitive output?

The **compression_ratio_threshold** measures the gzip compression ratio of generated text. Repetitive sequences compress extremely well, so if the ratio exceeds the threshold (default 2.4), Whisper treats the hypothesis as "stuck" and retries with a higher temperature. This mechanism, defined in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py), automatically escapes repetitive loops.

### When should I use patience versus temperature for controlling generation?

Use **temperature** when you want to adjust the fundamental randomness of token selection—lower for accurate transcription, higher for creative or exploratory tasks. Use **patience** (values >1.0) when running beam-search-style sampling to keep more candidate beams alive longer, which improves quality at the cost of speed without increasing token-level randomness.

### What happens when no_speech_threshold is exceeded during decoding?

When the average log-probability of the `<|nospeech|>` token exceeds the **no_speech_threshold**, the decoder immediately returns an empty transcription. As implemented in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py), this prevents the model from hallucinating text during silent audio segments, making it useful for voice activity detection in streaming applications.