# Whisper Model Sizes: VRAM Requirements and Speed Trade-offs Explained

> **OpenAI Whisper provides six model variants—tiny, base, small, medium, large, and turbo—that range from 39 million to 1.55 billion parameters, requiring between ~1 GB and ~10 GB of GPU VRAM and offering relative inference spee...

- Repository: [OpenAI/whisper](https://github.com/openai/whisper)
- Tags: 
- Published: 2026-02-27

---

**OpenAI Whisper provides six model variants—tiny, base, small, medium, large, and turbo—that range from 39 million to 1.55 billion parameters, requiring between ~1 GB and ~10 GB of GPU VRAM and offering relative inference speeds varying by up to 10× on an NVIDIA A100 GPU.**

OpenAI's Whisper is an open-source automatic speech recognition (ASR) system that ships with multiple pre-trained checkpoints to accommodate different hardware constraints and latency requirements. Understanding the available Whisper model sizes and their associated VRAM requirements and speed trade-offs enables developers to select the optimal variant for edge devices, consumer GPUs, or high-performance data center hardware. The `openai/whisper` repository implements these variants using a unified Transformer-based encoder-decoder architecture defined in [`whisper/model.py`](https://github.com/openai/whisper/blob/main/whisper/model.py), where each size differs only in layer depth and hidden dimension configurations.

## Available Whisper Model Sizes and Specifications

The Whisper ecosystem provides six official model sizes, four of which offer English-only variants (denoted by the `.en` suffix) optimized for monolingual speech recognition. According to the official documentation in [`README.md`](https://github.com/openai/whisper/blob/main/README.md) (lines 64-71), the specifications break down as follows:

- **tiny**: 39 M parameters, ~1 GB VRAM, ~10× relative speed, available as `tiny.en` (English-only) and `tiny` (multilingual)
- **base**: 74 M parameters, ~1 GB VRAM, ~7× relative speed, available as `base.en` and `base`
- **small**: 244 M parameters, ~2 GB VRAM, ~4× relative speed, available as `small.en` and `small`
- **medium**: 769 M parameters, ~5 GB VRAM, ~2× relative speed, available as `medium.en` and `medium`
- **large**: 1,550 M parameters, ~10 GB VRAM, 1× relative speed (baseline), multilingual only (`large`)
- **turbo**: 809 M parameters, ~6 GB VRAM, ~8× relative speed, multilingual only (`turbo`)

The **large** model serves as the accuracy baseline but requires the most memory, while **turbo** offers a compelling middle ground with near-large accuracy at significantly reduced latency.

## Architecture and Memory Consumption

All Whisper variants share the identical Transformer encoder-decoder architecture implemented in the `Whisper` class within [`whisper/model.py`](https://github.com/openai/whisper/blob/main/whisper/model.py). The model loads checkpoint metadata to construct a dimension specification object (`model.dims`) that defines the width (hidden units) and depth (number of layers) for that specific size. Larger models allocate proportionally more tensors for self-attention weights and feed-forward layers, directly increasing GPU memory consumption during the forward pass.

The **turbo** variant represents a specialized optimization of the large-v3 architecture. While maintaining approximately 809 M parameters, it applies architectural tweaks including fused linear layers and reduced precision computations to achieve roughly 8× the speed of the standard large model with only ~6 GB VRAM usage—compared to the large model's ~10 GB requirement.

## Loading Models and Inspecting Hardware Requirements

The `load_model` function in [`whisper/__init__.py`](https://github.com/openai/whisper/blob/main/whisper/__init__.py) (lines 17-23) resolves model names to download URLs via the internal `_MODELS` dictionary and handles checkpoint caching. You can programmatically inspect a model's architectural dimensions to verify resource requirements before transcription.

```python
import whisper

# Load a lightweight model for low-VRAM environments (~1 GB)

model = whisper.load_model("base")  # 74 M parameters

# Load the fastest multilingual option (~6 GB VRAM)

turbo = whisper.load_model("turbo")  # 809 M parameters, ~8× speed

# Inspect architectural dimensions that determine memory usage

print(f"Encoder layers: {model.dims.n_encoder_layers}")
print(f"Decoder layers: {model.dims.n_decoder_layers}")
print(f"Hidden dimension: {model.dims.n_audio_state}")

# Transcribe audio

result = model.transcribe("audio.wav")
print(result["text"])

```

## Performance Benchmarks and Trade-offs

When benchmarked on an NVIDIA A100 GPU using English speech transcription, the relative speed differences between Whisper model sizes reveal significant trade-offs between accuracy and latency:

- **tiny**: ~10× faster than large, suitable for real-time prototyping on resource-constrained devices
- **base**: ~7× faster than large, ideal for batch processing on consumer GPUs
- **small**: ~4× faster than large, balancing accuracy and speed for production deployments
- **medium**: ~2× faster than large, offering near-large accuracy with half the latency
- **large**: Baseline 1× speed, maximum accuracy, requires ~10 GB VRAM
- **turbo**: ~8× faster than large, optimized for low-latency applications requiring multilingual support

## Summary

- Whisper provides six model sizes ranging from 39 M to 1.55 B parameters, with VRAM requirements scaling from ~1 GB to ~10 GB.
- English-only variants (`.en`) exist for tiny, base, small, and medium sizes, offering slightly better accuracy on monolingual English audio.
- The **turbo** model delivers ~8× the speed of large with only ~6 GB VRAM by applying architectural optimizations to the large-v3 checkpoint.
- Model dimensions are accessible via `model.dims` after loading, exposing layer counts and hidden states that correlate directly with memory consumption.
- The `load_model` function in [`whisper/__init__.py`](https://github.com/openai/whisper/blob/main/whisper/__init__.py) automatically downloads and caches checkpoints based on the `_MODELS` registry.

## Frequently Asked Questions

### Which Whisper model size should I use for real-time transcription on limited hardware?

For devices with only 2 GB of VRAM, use the **small** model (244 M parameters), which runs at approximately 4× the speed of the large baseline while maintaining acceptable word-error rates. If VRAM is constrained to ~1 GB, the **base** or **tiny** models provide the fastest inference (~7× to ~10× speed) at the cost of reduced accuracy on noisy audio or accented speech.

### What is the difference between .en models and multilingual models?

The `.en` variants (tiny.en, base.en, small.en, medium.en) are trained exclusively on English audio and typically achieve lower word-error rates on monolingual English speech compared to their multilingual counterparts. The **large** and **turbo** models are multilingual only and support 99 languages, making them necessary for non-English transcription or automatic language detection workflows.

### How does the turbo model achieve faster speeds than large with similar parameter counts?

The **turbo** model applies architectural optimizations to the large-v3 checkpoint, including fused linear layer operations and reduced precision computations, allowing it to process audio at ~8× the speed of the standard large model while using only ~6 GB of VRAM versus the large model's ~10 GB requirement. These optimizations maintain most of the large model's accuracy while significantly reducing computational overhead.

### How can I verify VRAM requirements before running transcription?

After loading a model with `whisper.load_model()`, inspect the `model.dims` object to view the encoder and decoder layer counts and hidden dimensions. While PyTorch does not provide a native VRAM calculator, you can estimate memory usage by monitoring GPU allocation with `torch.cuda.memory_allocated()` immediately after model instantiation, or reference the documented requirements (~1 GB for tiny/base, ~2 GB for small, ~5 GB for medium, ~6 GB for turbo, and ~10 GB for large).