# How to Suppress Specific Tokens or Blank Outputs During Whisper Decoding

> Learn how to suppress specific tokens or blank outputs in Whisper decoding. Configure DecodingOptions with suppress_blank and suppress_tokens for cleaner results.

- Repository: [OpenAI/whisper](https://github.com/openai/whisper)
- Tags: 
- Published: 2026-02-27

---

**You can suppress specific tokens or blank outputs during Whisper decoding by configuring the `suppress_blank` and `suppress_tokens` parameters in `DecodingOptions`, which apply logit filters to mask unwanted tokens before sampling.**

OpenAI's Whisper uses a flexible logit-filter pipeline that lets you control which tokens the decoder is allowed to emit. By setting options in `DecodingOptions`, you can prevent blank outputs at the start of transcription or permanently block specific token IDs throughout the decoding process.

## Understanding Whisper's Logit Filter Pipeline

The suppression mechanism operates inside [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) through four distinct stages:

1. **Option Parsing** – When you instantiate `DecodingOptions`, the fields `suppress_blank` (default `True`) and `suppress_tokens` (default `"-1"`) are stored. These are defined at lines 104–108 in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py).

2. **Token List Resolution** – The `DecodingTask._get_suppress_tokens()` method (lines 15–42) converts your input into concrete token IDs. If you pass `-1`, the method automatically expands it to include all tokens returned by `Tokenizer.non_speech_tokens`, while guarding against special control tokens like `sot` and `eot`.

3. **Filter Application** – During each decoding step inside `DecodingTask.__init__` (lines 55–60), the library instantiates `SuppressBlank` and `SuppressTokens` classes. These are appended to `self.logit_filters`.

4. **Logit Masking** – In the main loop, `SuppressBlank.apply()` (lines 28–31) masks the space token and `eot` to `-∞` **only on the first step** (`tokens.shape[1] == self.sample_begin`). Meanwhile, `SuppressTokens.apply()` (lines 34–38) masks your specified token IDs on **every** step. After these filters run, the decoder samples from the modified logits.

## Suppressing Blank Outputs

### How suppress_blank Works

When `suppress_blank=True` (the default), Whisper prevents the model from emitting a space character as the first token. This is handled by the `SuppressBlank` class in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py). At the first sampling step, it forces the log-probability of the space token and the end-of-text token to negative infinity.

```python
from whisper import decode, Whisper, DecodingOptions

model = Whisper.load_model("base")
mel = ...  # your mel spectrogram input

# Default behavior: suppress_blank is True by default

result = decode(model, mel)
print(result.text)  # Will never start with a space

```

### Disabling Blank Suppression

If you need to allow leading spaces—for example, when concatenating chunks or processing partial audio—set `suppress_blank=False`:

```python
options = DecodingOptions(suppress_blank=False)
result = decode(model, mel, options=options)

```

## Suppressing Specific Tokens

### Using Token IDs

To block specific characters or words, pass a list of token IDs to `suppress_tokens`. You can obtain these IDs using the Whisper tokenizer:

```python
from whisper import get_tokenizer

tokenizer = get_tokenizer(multilingual=False)
comma_id = tokenizer.encode(",")[0]
period_id = tokenizer.encode(".")[0]

options = DecodingOptions(
    suppress_blank=False,
    suppress_tokens=[comma_id, period_id]
)
result = decode(model, mel, options=options)

```

### Suppressing Non-Speech Tokens with -1

The most common pattern is passing `"-1"` (or `[-1]`), which automatically expands to all non-speech tokens defined in `Tokenizer.non_speech_tokens`. This includes special markers like `<|no_speech|>` and various punctuation marks:

```python
options = DecodingOptions(
    suppress_blank=True,
    suppress_tokens="-1"  # Expands to all non-speech tokens

)
result = decode(model, mel, options=options)

```

According to the source code in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) (lines 15–42), when `-1` is detected, the method adds the full set of `non_speech_tokens` while explicitly excluding critical control tokens like `sot` (start-of-transcript) and `eot` (end-of-transcript) to prevent decoding failure.

### Combining Suppression Strategies

You can combine both options to fine-tune output. For example, allowing initial spaces but suppressing all non-speech tokens:

```python
options = DecodingOptions(
    suppress_blank=False,  # Allow leading space

    suppress_tokens="-1"  # But hide <|no_speech|> and punctuation

)
result = decode(model, mel, options=options)

```

## Key Implementation Files

The suppression logic is distributed across these critical files in the OpenAI Whisper repository:

| File | Purpose |
|------|---------|
| [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) | Contains `DecodingOptions`, `SuppressBlank`, `SuppressTokens`, and `DecodingTask._get_suppress_tokens()` (lines 15–42, 55–60, 104–108). This is the primary implementation file. |
| [`whisper/tokenizer.py`](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) | Defines `Tokenizer.non_speech_tokens`, which provides the token list used when `suppress_tokens="-1"` is specified. |
| [`whisper/utils.py`](https://github.com/openai/whisper/blob/main/whisper/utils.py) | Provides auxiliary helpers such as `compression_ratio` used in final `DecodingResult` calculations. |
| [`whisper/__main__.py`](https://github.com/openai/whisper/blob/main/whisper/__main__.py) | CLI entry point that exposes `--suppress_blank` and `--suppress_tokens` flags, forwarding them to the underlying `DecodingOptions`. |

## Summary

- **Suppress blank outputs** by setting `suppress_blank=True` (default) in `DecodingOptions` to prevent the model from emitting a space as the first token.
- **Suppress specific tokens** by passing token IDs to `suppress_tokens`; use `"-1"` to automatically block all non-speech tokens defined in `Tokenizer.non_speech_tokens`.
- **Implementation location**: The logic resides in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py) within the `SuppressBlank` and `SuppressTokens` classes, applied during each step of `DecodingTask._main_loop`.
- **CLI support**: Use `--suppress_blank` and `--suppress_tokens` flags when running `python -m whisper`.

## Frequently Asked Questions

### What is the difference between suppress_blank and suppress_tokens?

`suppress_blank` is a boolean that only affects the first decoding step, preventing the model from outputting a space token (blank) at the beginning of the transcription. `suppress_tokens` accepts a list of token IDs (or the string `"-1"`) that are masked to negative infinity on **every** decoding step, allowing you to block specific characters, punctuation, or non-speech markers throughout the entire sequence.

### How do I find the token ID for a specific character or word?

Use the `get_tokenizer` function from the `whisper` module to access the tokenizer, then call `encode()` on your target string. For example, `tokenizer.encode(",")[0]` returns the integer ID for the comma token. Note that Whisper uses a Byte Pair Encoding (BPE) tokenizer, so some words may split into multiple token IDs.

### Can I suppress tokens after decoding has started?

No, the `suppress_tokens` and `suppress_blank` options must be configured before decoding begins via `DecodingOptions`. The suppression filters are instantiated once during `DecodingTask.__init__` (lines 55–60 in [`whisper/decoding.py`](https://github.com/openai/whisper/blob/main/whisper/decoding.py)) and applied consistently throughout the `_main_loop`. To change suppression behavior mid-stream, you would need to stop decoding and restart with new options.

### Does suppressing tokens affect Whisper's performance or accuracy?

Suppressing tokens has negligible computational overhead because it simply sets specific logit values to `-∞` before the softmax operation. However, it can significantly impact **accuracy** depending on what you suppress. Blocking common punctuation or the `no_speech` token may produce more continuous text but could also merge sentences incorrectly or remove important structural cues. Always validate output quality when using aggressive suppression lists.