How to Use `initial_prompt` and `condition_on_previous_text` for Context in OpenAI Whisper

Use initial_prompt to inject static text at the start of transcription, and enable condition_on_previous_text (default: True) to carry decoded output from previous audio windows into subsequent decoding steps for contextual continuity.

OpenAI Whisper processes long audio files by splitting them into overlapping windows and decoding each sequentially. The initial_prompt and condition_on_previous_text parameters in whisper/transcribe.py control how contextual information flows between these windows, allowing you to guide the model with domain-specific vocabulary, speaker names, or formatting hints.

Understanding Whisper's Windowed Transcription Architecture

Whisper handles long-form audio by processing it in chunks. For each window, the decoder can receive a prompt—a list of token IDs that the model treats as preceding text. This mechanism prevents the model from treating every audio segment as an isolated utterance.

The transcription loop in whisper/transcribe.py manages two distinct prompt sources:

  • Static prompts provided by the user via initial_prompt
  • Dynamic prompts generated from previously decoded text when condition_on_previous_text=True

How initial_prompt Injects Static Context

The initial_prompt parameter accepts an optional string that gets tokenized once at the beginning of transcription. According to the source code in whisper/transcribe.py (lines 48-52), these tokens are prepared before the main decoding loop begins.

Single Window Context (Default Behavior)

By default, carry_initial_prompt=False, meaning the initial_prompt tokens are only prepended to the very first audio window (lines 239-244). This is ideal for providing a one-time context such as a language hint or speaker identification that should influence the opening of the transcript but not constrain subsequent segments.

import whisper

model = whisper.load_model("base")
result = whisper.transcribe.transcribe(
    model,
    "audio.mp3",
    initial_prompt="Speaker: Dr. Smith\nTopic: Cardiology",
    carry_initial_prompt=False,  # Only first window sees this

    condition_on_previous_text=True
)

Persistent Context with carry_initial_prompt

When you set carry_initial_prompt=True, the code prepends the initial_prompt tokens to every internal decode() call (lines 288-291). This ensures that every window receives the same leading context, which helps maintain consistent formatting or domain-specific language modeling throughout the transcription.

result = whisper.transcribe.transcribe(
    model,
    "audio.mp3",
    initial_prompt="Medical Transcription - Patient ID 12345:",
    carry_initial_prompt=True,   # Prepended to every window

    condition_on_previous_text=True
)

How condition_on_previous_text Maintains Continuity

The condition_on_previous_text boolean (default: True) controls whether Whisper feeds the decoded output of the previous window into the next one as a prompt. When enabled, the model receives a running context that prevents it from "resetting" between windows (line 503).

This dynamic conditioning is crucial for maintaining consistency in proper nouns, acronyms, and speaking style across long recordings. However, if the model encounters a difficult segment and produces an error, this error can propagate forward because subsequent windows are conditioned on the mistaken text (line 550).

Interaction Between Prompt Options

When both carry_initial_prompt=True and condition_on_previous_text=True are active, the prompt for each window contains both the static initial_prompt tokens and the dynamic previous-window text. The code constructs this combined prompt in whisper/transcribe.py before passing it via decode_options["prompt"] to the model's decode() method.

Critical Trade-off: Because the prompt length is limited, prepending a long initial_prompt can truncate the dynamic context from the previous window. The source code explicitly warns about this limitation (lines 548-550), noting that excessive static prompting reduces the benefit of condition_on_previous_text.

Practical Implementation Examples

These patterns demonstrate common configurations using the Python API or command-line interface:

Isolate Difficult Segments

Disable conditioning when processing noisy or unrelated audio sections where context propagation causes hallucinations:

result = whisper.transcribe.transcribe(
    model,
    "noisy_audio.mp3",
    initial_prompt=None,
    condition_on_previous_text=False  # Each window decoded independently

)

Combine Static and Dynamic Context

Use a short persistent header while maintaining window-to-window continuity:

whisper audio.mp3 --initial_prompt "Court Proceedings:" --carry_initial_prompt True --condition_on_previous_text True

Domain-Specific Vocabulary Priming

Prime the model with technical terms at the start without consuming context window space throughout:

result = whisper.transcribe.transcribe(
    model,
    "tech_talk.mp3",
    initial_prompt="Kubernetes, Docker, microservices",
    carry_initial_prompt=False,
    condition_on_previous_text=True
)

Key Source Files and Implementation Details

  • whisper/transcribe.py: Contains the core transcription loop and prompt assembly logic. Relevant sections include parameter definition (lines 48-52), first-window initialization (lines 239-244), carry_initial_prompt handling (lines 288-291), and condition_on_previous_text application (line 503).
  • whisper/tokenizer.py: Handles the tokenizer.encode call that converts initial_prompt strings into token IDs used by the decoder.
  • whisper/__main__.py: Defines CLI arguments including --initial_prompt, --carry_initial_prompt, and --condition_on_previous_text.

Summary

  • initial_prompt provides static context tokenized at the start of transcription, useful for domain vocabulary or formatting hints.
  • carry_initial_prompt=True prepends the initial prompt to every decode window, but reduces space available for condition_on_previous_text context.
  • condition_on_previous_text (default enabled) feeds previous window output forward, maintaining consistency across long audio files.
  • These parameters interact in the prompt construction logic in whisper/transcribe.py, where the combined token list is passed to decode_options["prompt"].
  • Disable condition_on_previous_text when the model gets stuck in error loops on difficult audio segments.

Frequently Asked Questions

What is the difference between initial_prompt and condition_on_previous_text?

initial_prompt accepts user-provided text that remains constant throughout (or at the start of) transcription, while condition_on_previous_text automatically feeds the model's own output from previous audio windows into subsequent ones. The former provides static guidance you control; the latter provides dynamic continuity the model generates.

Should I enable carry_initial_prompt for long-form transcription?

Only if every audio window requires the same leading context, such as a mandatory header or consistent speaker label. Be aware that in whisper/transcribe.py (lines 548-550), the code warns that persistent initial prompts consume token budget that would otherwise carry previous-window context, potentially harming transcription coherence across window boundaries.

Why would I disable condition_on_previous_text?

Disable this option when the model enters a failure loop—repeatedly hallucinating the same incorrect text across multiple windows—because the error propagates forward via the prompt. Setting condition_on_previous_text=False makes each window independent, allowing the model to recover from localized audio corruption or ambiguous speech.

How do I use these options from the command line?

Whisper's CLI exposes --initial_prompt as a string argument, --carry_initial_prompt as a flag (stores_true), and --condition_on_previous_text as a boolean flag (default True). For example: whisper audio.mp3 --initial_prompt "Interview transcript:" --carry_initial_prompt.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →