deep-dive

Whisper DecodingOptions Sampling Parameters: A Complete Guide to Stochastic Generation

February 27, 2026 openai/whisper ↗

The key sampling parameters in whisper.DecodingOptions include temperature for controlling randomness, best_of for candidate selection, patience and length_penalty for beam management, and fallback thresholds like compression_ratio_threshold and logprob_threshold for quality control.

OpenAI Whisper uses the DecodingOptions dataclass in whisper/decoding.py to centralize every knob that controls how the model generates text from encoded audio. When operating in sampling mode—stochastic generation rather than deterministic beam search—these parameters determine the trade-off between transcription reliability and creative diversity.

Core Sampling Parameters in DecodingOptions

Temperature and Stochastic Control

The temperature parameter scales logits before the softmax operation. Lower values sharpen the probability distribution, making the model more deterministic, while higher values inject randomness.

0.0 → Greedy decoding (most deterministic)
0.7-1.0 → Moderate diversity
>1.0 → High randomness, more varied outputs

In whisper/decoding.py, this directly affects the token sampling probability distribution during each generation step.

Best-of Sampling and Candidate Selection

The best_of parameter works in conjunction with temperature to draw multiple independent samples. The decoder generates best_of candidate transcriptions and returns the one with the highest average log-probability.

This trades compute for quality:

import whisper

model = whisper.load_model("base")
audio = whisper.load_audio("audio.wav")
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# Draw 5 samples and pick the best

options = whisper.DecodingOptions(
    temperature=0.8,
    best_of=5
)
result = model.decode(mel, options)
print(result.text)

Patience and Length Penalty

The patience parameter controls early-stopping behavior during beam-search-style sampling. Values greater than 1.0 relax beam pruning, encouraging exploration by keeping more candidates alive longer.

The length_penalty applies a multiplicative adjustment (sequence_length) ** (-length_penalty) to longer hypotheses:

Positive values discourage overly long transcriptions
Negative values encourage longer, more detailed outputs
0.0 disables length normalization

Fallback and Quality Control Parameters

Compression Ratio and Log Probability Thresholds

Whisper implements automatic fallback mechanisms when initial sampling produces low-quality candidates. The temperature_increment_on_fallback triggers when the decoder needs to retry, automatically increasing temperature to introduce randomness and escape local optima.

The compression_ratio_threshold compares the gzip compression ratio of generated text against this limit. Exceeding the threshold indicates repetitive output, triggering a retry with higher temperature.

The logprob_threshold sets the minimum average log-probability per token. Hypotheses scoring below this value are rejected, prompting the decoder to sample again with increased temperature.


# Configure fallback behavior

options = whisper.DecodingOptions(
    temperature=0.0,  # Start greedy

    temperature_increment_on_fallback=0.2,
    compression_ratio_threshold=2.4,
    logprob_threshold=-1.0,
    patience=1.5
)

No-Speech Detection

The no_speech_threshold enables silence detection during sampling. When the average log-probability of the <|nospeech|> token exceeds this threshold, the decoder returns an empty transcription. This prevents hallucinated text during silent audio segments.

Token Suppression

The suppress_tokens parameter accepts a list of token IDs that are forcibly set to negative infinity probability during sampling. This guarantees specific tokens never appear in the output. Common use cases include suppressing specific punctuation or formatting tokens.

According to the Whisper source code in whisper/tokenizer.py, you can use predefined constants like suppress_token_id or custom token IDs:


# Suppress specific tokens

tokenizer = whisper.tokenizer.get_tokenizer(multilingual=True)
options = whisper.DecodingOptions(
    temperature=0.7,
    suppress_tokens=[tokenizer.suppress_token_id, 50363]  # Block specific tokens

)

Practical Implementation Examples

When working with the high-level API in whisper/transcribe.py, you can pass DecodingOptions parameters directly to model.transcribe():

import whisper

model = whisper.load_model("base")

# High-quality sampling configuration

result = model.transcribe(
    "audio.wav",
    temperature=0.8,
    best_of=5,
    patience=1.2,
    length_penalty=0.2,
    compression_ratio_threshold=2.4,
    logprob_threshold=-1.0,
    no_speech_threshold=0.6
)
print(result["text"])

For direct decoder access as implemented in whisper/model.py, instantiate DecodingOptions explicitly:

audio = whisper.load_audio("audio.wav")
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# Conservative sampling for accurate transcription

options = whisper.DecodingOptions(
    temperature=0.0,  # Greedy

    patience=1.0,
    suppress_tokens=[]
)

result = model.decode(mel, options)
print(result.text)

Summary

temperature controls randomness: lower values produce deterministic output, higher values increase diversity.
best_of enables multiple sampling runs, returning the highest probability candidate.
patience and length_penalty manage beam search behavior and sequence length preferences.
Fallback parameters (temperature_increment_on_fallback, compression_ratio_threshold, logprob_threshold) automatically retry low-quality generations with increased randomness.
no_speech_threshold detects silent audio segments to prevent hallucinations.
suppress_tokens blocks specific token IDs from appearing in the output.

Frequently Asked Questions

What is the difference between temperature and best_of in Whisper sampling?

Temperature scales the logits before softmax to control randomness within a single generation pass, while best_of runs multiple independent sampling passes and selects the candidate with the highest average log-probability. You can combine them by setting temperature=0.8 and best_of=5 to generate diverse candidates and keep the best one.

How does the compression_ratio_threshold prevent repetitive output?

The compression_ratio_threshold measures the gzip compression ratio of generated text. Repetitive sequences compress extremely well, so if the ratio exceeds the threshold (default 2.4), Whisper treats the hypothesis as "stuck" and retries with a higher temperature. This mechanism, defined in whisper/decoding.py, automatically escapes repetitive loops.

When should I use patience versus temperature for controlling generation?

Use temperature when you want to adjust the fundamental randomness of token selection—lower for accurate transcription, higher for creative or exploratory tasks. Use patience (values >1.0) when running beam-search-style sampling to keep more candidate beams alive longer, which improves quality at the cost of speed without increasing token-level randomness.

What happens when no_speech_threshold is exceeded during decoding?

When the average log-probability of the <|nospeech|> token exceeds the no_speech_threshold, the decoder immediately returns an empty transcription. As implemented in whisper/decoding.py, this prevents the model from hallucinating text during silent audio segments, making it useful for voice activity detection in streaming applications.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how openai/whisper works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →