# How to Enable and Configure Timestamp Generation in Whisper

> **Whisper automatically generates segment-level timestamps for every transcription, and you can activate word-level precision by passing `word_timestamps=True` to the `transcribe()` function or using the `--word_timestamps` CLI...

- Repository: [OpenAI/whisper](https://github.com/openai/whisper)
- Tags: 
- Published: 2026-02-27

---

**Whisper automatically generates segment-level timestamps for every transcription, and you can activate word-level precision by passing `word_timestamps=True` to the `transcribe()` function or using the `--word_timestamps` CLI flag.**

The openai/whisper repository provides a comprehensive timestamp generation pipeline that supports both default segment boundaries and fine-grained word-level alignment. By configuring the available parameters in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) and the utility writers in [`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py), you can produce publication-ready subtitles and structured metadata for video editing workflows.

## Understanding Whisper's Timestamp Architecture

The timestamp generation system consists of several coordinated components across the codebase:

| Component | Role | Source Location |
| --- | --- | --- |
| **`transcribe()`** | Core entry point that accepts the `word_timestamps` flag and orchestrates the pipeline. | [[`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py)](https://github.com/openai/whisper/blob/main/whisper/transcribe.py#L38-L52) |
| **`add_word_timestamps()`** | Computes per-word timing using cross-attention patterns and dynamic time warping when word-level timestamps are enabled. | [[`whisper/timing.py`](https://github.com/openai/whisper/blob/main/whisper/timing.py)](https://github.com/openai/whisper/blob/main/whisper/timing.py#L279) |
| **`format_timestamp()`** | Converts floating-point seconds into formatted strings like `hh:mm:ss.xx` for subtitle outputs. | [[`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py)](https://github.com/openai/whisper/blob/main/whisper/utils.py#L50-L68) |
| **Subtitle Writers** | Concrete implementations (`WriteSRT`, `WriteVTT`) that render timestamps with configurable formatting options. | [[`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py)](https://github.com/openai/whisper/blob/main/whisper/utils.py#L30-L66) |
| **`get_writer()`** | Factory function that returns the appropriate writer instance for formats including `srt`, `vtt`, `tsv`, `json`, or `all`. | [[`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py)](https://github.com/openai/whisper/blob/main/whisper/utils.py#L96-L104) |
| **Tokenizer** | Defines special timestamp tokens (e.g., `<|0.00|>`) that the model emits during generation. | [[`whisper/tokenizer.py`](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py)](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) |

## Enabling Word-Level Timestamps

By default, Whisper returns **segment-level timestamps** (`start` and `end` fields for each spoken paragraph). To obtain **word-level timestamps**, you must explicitly enable the feature.

### Python API Method

Pass `word_timestamps=True` to the `transcribe()` function. You can also control punctuation attachment using `prepend_punctuations` and `append_punctuations`:

```python
from whisper import load_model, transcribe, get_writer

model = load_model("base")
audio_path = "example.wav"

result = transcribe(
    model,
    audio_path,
    word_timestamps=True,
    prepend_punctuations="\"'“([{-",
    append_punctuations="\"'.。,，!?!:：”)]}、",
)

```

When enabled, the `result` dictionary contains a `words` list within each segment, with each entry providing `start`, `end`, and `text` fields computed by `add_word_timestamps()` in [`whisper/timing.py`](https://github.com/openai/whisper/blob/main/whisper/timing.py).

### Command-Line Interface

Use the `--word_timestamps` flag. The CLI parser in [`whisper/__main__.py`](https://github.com/openai/whisper/blob/main/whisper/__main__.py) validates that word-related formatting options are only accepted when this flag is set:

```bash
whisper audio.mp3 \
  --model base \
  --word_timestamps \
  --output_format srt

```

## Configuring Timestamp Output Formats

The subtitle writers expose several formatting knobs that control how timestamps appear in generated files:

- **`always_include_hours`** – Forces the `hh:` prefix even when hours are zero (default for SRT).
- **`decimal_marker`** – Specifies the separator between seconds and milliseconds (`,` for SRT, `.` for VTT).
- **`highlight_words`** – Wraps each word in `<u>` tags when generating SRT or VTT outputs.
- **`max_line_width`**, **`max_line_count`**, **`max_words_per_line`** – Controls line-breaking logic in `SubtitlesWriter.iterate_result()` (lines 32-40 and 200-221 in [`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py)).

Set these via the writer class attributes before calling the writer:

```python
srt_writer = get_writer("srt", output_dir="out")
srt_writer.always_include_hours = True
srt_writer.decimal_marker = ","
srt_writer(result, audio_path)

```

## Processing Specific Time Ranges

To generate timestamps for only a portion of the audio, use the `clip_timestamps` parameter. In the Python API, pass a string formatted as `start,end`. In the CLI, use `--clip_timestamps`:

```bash
whisper audio.wav \
  --model small \
  --output_format vtt \
  --clip_timestamps "30,45"

```

The `transcribe()` function parses this string (lines 68-78 in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py)) and converts the values to frame indices, producing timestamps relative to the original audio timeline.

## Practical Implementation Examples

### Example 1: Python API with Full Word-Level Control

```python
from whisper import load_model, transcribe, get_writer

model = load_model("base")
audio_path = "interview.wav"

# Enable word timestamps with custom punctuation handling

result = transcribe(
    model,
    audio_path,
    word_timestamps=True,
    prepend_punctuations="\"'“([{-",
    append_punctuations="\"'.。,，!?!:：”)]}、",
)

# Configure SRT writer with comma decimal marker and hour field

srt_writer = get_writer("srt", output_dir="./subtitles")
srt_writer.always_include_hours = True
srt_writer.decimal_marker = ","
srt_writer(result, audio_path)

```

### Example 2: CLI with Highlighting and Line Constraints

```bash
whisper podcast.mp3 \
  --model medium \
  --output_dir ./output \
  --output_format srt \
  --word_timestamps \
  --highlight_words \
  --max_line_width 42 \
  --max_line_count 2

```

This command activates word-level timestamps, underlines each word in the SRT file, and restricts subtitles to two lines of maximum 42 characters each.

### Example 3: Extracting a Specific Clip with VTT Output

```python
import whisper

model = whisper.load_model("base")
result = model.transcribe(
    "lecture.mp3",
    clip_timestamps="120,300",
    word_timestamps=True
)

writer = whisper.get_writer("vtt", "./clips")
writer(result, "lecture.mp3")

```

## Summary

- Whisper inherently produces **segment-level timestamps** (`start` and `end`) for every transcription without requiring special configuration.
- **Word-level timestamps** require explicit activation via `word_timestamps=True` (API) or `--word_timestamps` (CLI), triggering the `add_word_timestamps()` algorithm in [`whisper/timing.py`](https://github.com/openai/whisper/blob/main/whisper/timing.py).
- **Formatting control** is handled by writer classes in [`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py), supporting customization of decimal markers, hour fields, and line-breaking rules.
- **Clip extraction** uses `clip_timestamps` to process specific audio intervals while maintaining timestamp alignment with the original source.
- The entire pipeline flows from token generation in [`whisper/tokenizer.py`](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) through transcription logic in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) to final output formatting via `format_timestamp()` and writer classes.

## Frequently Asked Questions

### Does Whisper generate timestamps by default?

Yes. According to the source code in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py), every transcription result automatically includes `start` and `end` timestamps for each segment. These segment-level timestamps require no special flags and are always available in the output dictionary.

### How do I get word-level timestamps in Whisper?

Pass `word_timestamps=True` to the `transcribe()` function in Python, or use the `--word_timestamps` flag on the command line. This invokes `add_word_timestamps()` in [`whisper/timing.py`](https://github.com/openai/whisper/blob/main/whisper/timing.py), which analyzes cross-attention weights to assign precise start and end times to individual words within each segment.

### Can I customize the timestamp format in SRT files?

Yes. The `WriteSRT` class in [`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py) exposes `always_include_hours` and `decimal_marker` attributes. Set `always_include_hours=True` to ensure hours appear in the timestamp, and set `decimal_marker=","` to use the standard SRT comma separator between seconds and milliseconds.

### What is the purpose of the `clip_timestamps` parameter?

The `clip_timestamps` parameter allows you to transcribe only a specific time range of an audio file (e.g., `"30,45"` for seconds 30 to 45). As implemented in [`whisper/transcribe.py`](https://github.com/openai/whisper/blob/main/whisper/transcribe.py) (lines 68-78), this feature converts the provided seconds into frame indices, processes only that portion of the audio, and returns timestamps relative to the original file's timeline.