# whisper | OpenAI | Knowledge Base | Instagit

Robust Speech Recognition via Large-Scale Weak Supervision

GitHub Stars: 95.2k

Repository: https://github.com/openai/whisper

---

## Articles

### [How to Extract Word-Level Timing Information in Whisper: A Complete Guide](/openai/whisper/how-extract-word-level-timing-information-whisper)

Extract word level timing in Whisper using the word_timestamps flag or API setting. Learn how to precisely align audio with text for accurate timing information.

- Tags: how-to-guide
- Published: 2026-02-27

### [How OpenAI Whisper Handles Audio Processing and Mel Spectrogram Generation](/openai/whisper/how-whisper-handle-audio-processing-mel-spectrogram)

Discover how OpenAI Whisper processes audio and generates mel spectrograms using its four-stage pipeline including ffmpeg loading STFT computation and Mel filterbank projection.

- Tags: internals
- Published: 2026-02-27

### [The Role of KV Caching in Whisper's Performance: Architecture and Implementation](/openai/whisper/role-kv-caching-whisper-performance)

**KV caching reduces Whisper's autoregressive decoding complexity from O(T²) to O(T) by reusing previously computed key and value tensors across generation steps, eliminating redundant attention calculations during long audio t...

- Tags: 
- Published: 2026-02-27

### [How to Enable and Customize Word-Level Timestamps in Whisper](/openai/whisper/how-enable-customize-word-level-timestamps-whisper)

Get granular control over Whisper's output by learning how to enable and customize word-level timestamps using the Python API or CLI. Fine-tune punctuation and silence settings for precise transcriptions.

- Tags: how-to-guide
- Published: 2026-02-27

### [How to Use Whisper for Language Detection Without Transcription: 2 Methods Explained](/openai/whisper/how-use-whisper-language-detection-without-transcription)

**You can detect the spoken language in an audio file using Whisper by calling `model.detect_language()` for a lightweight encoder-only check, or by setting `task="lang_id"` in `DecodingOptions` to use the high-level decoding A...

- Tags: 
- Published: 2026-02-27

### [OpenAI Whisper model.transcribe() Result Dictionary Structure Explained](/openai/whisper/what-structure-result-dictionary-returned-model-transcribe)

**The `model.transcribe()` method returns a Python dictionary containing three top-level keys: `text` (the full transcription string), `segments` (a list of per-chunk dictionaries with timestamps and metadata), and `language` (...

- Tags: 
- Published: 2026-02-27

### [How to Use `initial_prompt` and `condition_on_previous_text` for Context in OpenAI Whisper](/openai/whisper/how-to-use-initial-prompt-condition-on-previous-text)

**Use `initial_prompt` to inject static text at the start of transcription, and enable `condition_on_previous_text` (default: True) to carry decoded output from previous audio windows into subsequent decoding steps for contextu...

- Tags: 
- Published: 2026-02-27

### [Whisper model.transcribe() Advanced Parameters: Temperature, Thresholds, and Decoding Options](/openai/whisper/advanced-parameters-model-transcribe-function)

Explore advanced Whisper model.transcribe() parameters like temperature and thresholds. Optimize transcription accuracy with detailed sampling and validation controls.

- Tags: deep-dive
- Published: 2026-02-27

### [How Whisper model.transcribe() Works: A Deep Dive into the Transcription Pipeline](/openai/whisper/how-high-level-model-transcribe-function-work)

**The `model.transcribe()` function is a high-level wrapper that orchestrates audio preprocessing, language detection, windowed decoding with temperature fallback, and timestamp extraction to convert speech into structured text...

- Tags: 
- Published: 2026-02-27

### [How to Enable and Configure Timestamp Generation in Whisper](/openai/whisper/how-enable-configure-timestamp-generation-whisper)

**Whisper automatically generates segment-level timestamps for every transcription, and you can activate word-level precision by passing `word_timestamps=True` to the `transcribe()` function or using the `--word_timestamps` CLI...

- Tags: 
- Published: 2026-02-27

### [How to Suppress Specific Tokens or Blank Outputs During Whisper Decoding](/openai/whisper/how-suppress-tokens-blank-outputs-whisper-decoding)

Learn how to suppress specific tokens or blank outputs in Whisper decoding. Configure DecodingOptions with suppress_blank and suppress_tokens for cleaner results.

- Tags: 
- Published: 2026-02-27

### [How to Configure Whisper for Greedy Decoding vs. Beam Search](/openai/whisper/how-configure-whisper-greedy-decoding-vs-beam-search)

**To configure OpenAI Whisper for greedy decoding, set `temperature=0.0` and leave `beam_size=None` in the `DecodingOptions` dataclass; for beam search, provide an integer value (e.g., `5`) to `beam_size` and set `temperature=0...

- Tags: 
- Published: 2026-02-27

### [Whisper DecodingOptions Sampling Parameters: A Complete Guide to Stochastic Generation](/openai/whisper/key-parameters-whisper-decodingoptions-sampling)

Master Whisper decoding with our guide to sampling parameters. Learn to control randomness using temperature, best_of, patience, and length_penalty for optimal AI generation.

- Tags: deep-dive
- Published: 2026-02-27

### [How `whisper.load_model()` Downloads and Verifies Model Checkpoints in OpenAI Whisper](/openai/whisper/how-whisper-load-model-handles-downloading-verification)

**`whisper.load_model()` automatically downloads model checkpoints to `~/.cache/whisper`, verifies their integrity using SHA-256 hashes embedded in the download URL, and returns a ready-to-use `Whisper` instance on the specifie...

- Tags: 
- Published: 2026-02-27

### [Whisper Turbo Model vs Large Model: Architecture, Speed, and Translation Differences](/openai/whisper/how-turbo-whisper-model-differs-large-model)

**The Whisper turbo model is a pruned, quantized variant of the large-v3 architecture that reduces parameters from ~1.55B to ~809M, cuts VRAM usage from ~10GB to ~6GB, and delivers approximately 8× faster inference on A100 GPUs...

- Tags: 
- Published: 2026-02-27

### [Whisper Model Sizes: VRAM Requirements and Speed Trade-offs Explained](/openai/whisper/what-are-available-whisper-model-sizes)

**OpenAI Whisper provides six model variants—tiny, base, small, medium, large, and turbo—that range from 39 million to 1.55 billion parameters, requiring between ~1 GB and ~10 GB of GPU VRAM and offering relative inference spee...

- Tags: 
- Published: 2026-02-27

### [SDPA (Scaled Dot Product Attention) in OpenAI Whisper: Implementation and Usage](/openai/whisper/what-is-sdpa-how-whisper-uses-it)

**SDPA (Scaled Dot Product Attention) is the core mathematical operation powering Whisper's Transformer attention layers, which OpenAI implements using PyTorch's fused `scaled_dot_product_attention` kernel for performance while...

- Tags: 
- Published: 2026-02-27

### [Core Components of the Whisper Model: AudioEncoder and TextDecoder Explained](/openai/whisper/what-are-core-components-whisper-model)

**The Whisper model consists of two primary neural components: an AudioEncoder that converts mel-spectrograms into latent embeddings, and a TextDecoder that generates transcription tokens via cross-attention to those embeddings.**

- Tags: 
- Published: 2026-02-27

### [How the Whisper Transformer Architecture Works: A Deep Dive into OpenAI's Speech Recognition Model](/openai/whisper/how-does-whisper-transformer-architecture-work)

**Whisper uses a dual-stream Transformer architecture consisting of an audio encoder and a text decoder that communicate through cross-attention to convert mel-spectrograms into transcribed text.**

- Tags: 
- Published: 2026-02-27

