# Whisper model.transcribe() Advanced Parameters: Temperature, Thresholds, and Decoding Options

> Explore advanced Whisper model.transcribe() parameters like temperature and thresholds. Optimize transcription accuracy with detailed sampling and validation controls.

- Repository: [OpenAI/whisper](https://github.com/openai/whisper)
- Tags: deep-dive
- Published: 2026-02-27

---

**The** `model.transcribe()` **function in OpenAI Whisper exposes advanced parameters—including temperature scheduling, compression-ratio thresholds, and** `DecodingOptions`**—that control sampling strategies, quality validation, and fallback loops to optimize transcription accuracy.**

The `model.transcribe()` method in the openai/whisper repository serves as the primary Python interface for speech-to-text inference. While basic usage requires only an audio path or tensor, the function signature in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) accepts over a dozen advanced parameters that govern decoding behavior, prompt conditioning, and timestamp granularity. Mastering these parameters allows developers to suppress hallucinations, handle noisy audio, and extract word-level alignments.

## Temperature Schedules and Quality Thresholds

The transcription pipeline implements an automatic fallback mechanism that sequentially retries decoding with different temperatures until quality checks pass.

### Configuring Temperature Sampling

The `temperature` parameter accepts either a single float or a tuple of floats. When a tuple is provided—commonly `(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)`—Whisper attempts each value in order until the output passes validation thresholds. According to the implementation in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) (lines 84-87), the decoder disables beam search when `temperature > 0` (sampling mode) and disables best-of sampling when `temperature == 0` (greedy mode).

```python
result = model.transcribe(
    audio_path,
    temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    compression_ratio_threshold=2.5,
    logprob_threshold=-1.2,
)

```

### Compression Ratio and Log-Probability Thresholds

Two primary thresholds filter low-quality generations:

- **`compression_ratio_threshold`** (default `2.4`): Maximum allowed gzip-compression ratio of the decoded text. Higher ratios indicate repetitive or "stuck" output, triggering a fallback to the next temperature (lines 76-82).
- **`logprob_threshold`** (default `-1.0`): Minimum average log-probability per token. If the model's confidence falls below this value, the decode is rejected (lines 84-90).

### No-Speech Detection

The `no_speech_threshold` (default `0.6`) defines the probability of the special `<|nospeech|>` token above which a segment is treated as silence. As implemented in lines 92-102, this check only triggers a fallback bypass when combined with a failing `logprob_threshold`, preventing the loop from retrying actual silence.

## Prompt Conditioning and Context Management

Whisper maintains context across audio windows through prompt conditioning, which can be tuned or disabled depending on the use case.

### Initial Prompts and Carry Behavior

The `initial_prompt` parameter injects domain-specific text at the start of the prompt, biasing the model toward specialized vocabulary (e.g., medical or legal terminology). When `carry_initial_prompt=True`, this text is prepended to *every* internal `decode()` call rather than only the first window. This logic resides in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) (lines 98-110).

```python
result = model.transcribe(
    audio_path,
    initial_prompt="Medical terminology: ECG, arrhythmia, cardiology.",
    carry_initial_prompt=True,
    temperature=0.3,
    beam_size=5,
)

```

### Conditioning on Previous Text

By default, `condition_on_previous_text=True` feeds the transcription of the previous window back as a prompt for the next window. Disabling this parameter can prevent the model from getting "stuck" in repetitive loops at the cost of occasional incoherence between segments (lines 86-89).

## Timestamp and Segmentation Controls

For applications requiring precise alignment or processing of specific audio clips, several parameters control segmentation boundaries.

### Word-Level Timestamps

Setting `word_timestamps=True` enables extraction of word-level timestamps using cross-attention weights and dynamic time-warping, with boundaries refined in [`whisper/timing.py`](https://github.com/openai/whisper/blob/main/whisper/timing.py). When enabled, the `prepend_punctuations` and `append_punctuations` parameters (defaulting to `"\"'“¿([{-"` and `"\"'.。,，!！?？:：”)]}、"` respectively) determine which punctuation characters merge with adjacent words.

```python
result = model.transcribe(
    audio_path,
    word_timestamps=True,
    prepend_punctuations="\"'“([{-",
    append_punctuations="\"'.!?,;:）】}",
    hallucination_silence_threshold=0.5,
)

```

The `hallucination_silence_threshold` (default `None`) activates a filter that skips silent periods longer than the specified value when a potential hallucination is detected during word-level processing.

### Clip Timestamps

The `clip_timestamps` parameter accepts a comma-separated string of start/end times in seconds (e.g., `"30,45"`) to restrict transcription to specific audio segments. The parsing logic in lines 66-73 of [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) converts these into frame indices before inference.

## DecodingOptions and Beam Search

All additional keyword arguments passed to `model.transcribe()` are forwarded as `**decode_options` to the `DecodingOptions` dataclass defined in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) (lines 80-112). This provides low-level control over the inference strategy:

- **`beam_size`**: Number of beams for beam search (active when `temperature=0`).
- **`best_of`**: Number of candidates to sample when using non-zero temperature.
- **`patience`**: Factor that encourages longer hypotheses during beam search.
- **`length_penalty`**: Exponential penalty applied to sequence length.
- **`suppress_tokens`**: Comma-separated list of token IDs to suppress (e.g., `"50259,50260"`).

```python
result = model.transcribe(
    audio_path,
    temperature=0.0,
    beam_size=8,
    patience=1.5,
    length_penalty=0.6,
    suppress_tokens="-1,50258",
)

```

## Summary

- **Temperature scheduling**: Pass a tuple like `(0.0, 0.2, 0.4)` to automatically retry with higher sampling temperatures if quality checks fail.
- **Quality thresholds**: Adjust `compression_ratio_threshold`, `logprob_threshold`, and `no_speech_threshold` in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) to filter repetitive or low-confidence outputs.
- **Prompt control**: Use `initial_prompt` with `carry_initial_prompt=True` to bias every window toward domain-specific vocabulary.
- **Temporal precision**: Enable `word_timestamps=True` for sub-segment alignment and use `clip_timestamps` to process specific audio intervals.
- **Decoding strategies**: Pass beam search parameters (`beam_size`, `patience`) or sampling parameters (`best_of`) via `**decode_options` to the underlying `DecodingOptions` class.

## Frequently Asked Questions

### What is the difference between temperature and beam_size in Whisper?

**`temperature`** controls the randomness of token sampling, where `0.0` is deterministic and `1.0` is highly random. **`beam_size`** activates beam search, which is only used when `temperature=0`; the source code in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) automatically disables beam search when temperature is greater than zero to ensure compatible decoding strategies.

### How does the fallback mechanism work when multiple temperatures are provided?

When `temperature` is a tuple, `model.transcribe()` iterates through each value sequentially. For each temperature, it decodes the audio and checks the `compression_ratio_threshold`, `logprob_threshold`, and `no_speech_threshold`. If any check fails, the loop proceeds to the next temperature; the first successful decode is returned as the final result.

### Why would I disable condition_on_previous_text?

Setting `condition_on_previous_text=False` prevents the model from using prior transcription windows as context for the current window. This is useful when you want to avoid context contamination or "stuck" loops where the model repeats phrases across windows, though it may reduce coherence at segment boundaries.

### Can I use word timestamps and beam search simultaneously?

Yes, but note that `word_timestamps=True` requires post-processing in [`whisper/timing.py`](https://github.com/openai/whisper/blob/main/whisper/timing.py) regardless of the decoding strategy. However, beam search (`beam_size > 1`) is only compatible with `temperature=0`. If you specify a non-zero temperature, the implementation automatically switches to sampling mode and disables beam search.