How to Suppress Specific Tokens or Blank Outputs During Whisper Decoding

You can suppress specific tokens or blank outputs during Whisper decoding by configuring the suppress_blank and suppress_tokens parameters in DecodingOptions, which apply logit filters to mask unwanted tokens before sampling.

OpenAI's Whisper uses a flexible logit-filter pipeline that lets you control which tokens the decoder is allowed to emit. By setting options in DecodingOptions, you can prevent blank outputs at the start of transcription or permanently block specific token IDs throughout the decoding process.

Understanding Whisper's Logit Filter Pipeline

The suppression mechanism operates inside whisper/decoding.py through four distinct stages:

  1. Option Parsing – When you instantiate DecodingOptions, the fields suppress_blank (default True) and suppress_tokens (default "-1") are stored. These are defined at lines 104–108 in whisper/decoding.py.

  2. Token List Resolution – The DecodingTask._get_suppress_tokens() method (lines 15–42) converts your input into concrete token IDs. If you pass -1, the method automatically expands it to include all tokens returned by Tokenizer.non_speech_tokens, while guarding against special control tokens like sot and eot.

  3. Filter Application – During each decoding step inside DecodingTask.__init__ (lines 55–60), the library instantiates SuppressBlank and SuppressTokens classes. These are appended to self.logit_filters.

  4. Logit Masking – In the main loop, SuppressBlank.apply() (lines 28–31) masks the space token and eot to -∞ only on the first step (tokens.shape[1] == self.sample_begin). Meanwhile, SuppressTokens.apply() (lines 34–38) masks your specified token IDs on every step. After these filters run, the decoder samples from the modified logits.

Suppressing Blank Outputs

How suppress_blank Works

When suppress_blank=True (the default), Whisper prevents the model from emitting a space character as the first token. This is handled by the SuppressBlank class in whisper/decoding.py. At the first sampling step, it forces the log-probability of the space token and the end-of-text token to negative infinity.

from whisper import decode, Whisper, DecodingOptions

model = Whisper.load_model("base")
mel = ...  # your mel spectrogram input

# Default behavior: suppress_blank is True by default

result = decode(model, mel)
print(result.text)  # Will never start with a space

Disabling Blank Suppression

If you need to allow leading spaces—for example, when concatenating chunks or processing partial audio—set suppress_blank=False:

options = DecodingOptions(suppress_blank=False)
result = decode(model, mel, options=options)

Suppressing Specific Tokens

Using Token IDs

To block specific characters or words, pass a list of token IDs to suppress_tokens. You can obtain these IDs using the Whisper tokenizer:

from whisper import get_tokenizer

tokenizer = get_tokenizer(multilingual=False)
comma_id = tokenizer.encode(",")[0]
period_id = tokenizer.encode(".")[0]

options = DecodingOptions(
    suppress_blank=False,
    suppress_tokens=[comma_id, period_id]
)
result = decode(model, mel, options=options)

Suppressing Non-Speech Tokens with -1

The most common pattern is passing "-1" (or [-1]), which automatically expands to all non-speech tokens defined in Tokenizer.non_speech_tokens. This includes special markers like <|no_speech|> and various punctuation marks:

options = DecodingOptions(
    suppress_blank=True,
    suppress_tokens="-1"  # Expands to all non-speech tokens

)
result = decode(model, mel, options=options)

According to the source code in whisper/decoding.py (lines 15–42), when -1 is detected, the method adds the full set of non_speech_tokens while explicitly excluding critical control tokens like sot (start-of-transcript) and eot (end-of-transcript) to prevent decoding failure.

Combining Suppression Strategies

You can combine both options to fine-tune output. For example, allowing initial spaces but suppressing all non-speech tokens:

options = DecodingOptions(
    suppress_blank=False,  # Allow leading space

    suppress_tokens="-1"  # But hide <|no_speech|> and punctuation

)
result = decode(model, mel, options=options)

Key Implementation Files

The suppression logic is distributed across these critical files in the OpenAI Whisper repository:

File Purpose
whisper/decoding.py Contains DecodingOptions, SuppressBlank, SuppressTokens, and DecodingTask._get_suppress_tokens() (lines 15–42, 55–60, 104–108). This is the primary implementation file.
whisper/tokenizer.py Defines Tokenizer.non_speech_tokens, which provides the token list used when suppress_tokens="-1" is specified.
whisper/utils.py Provides auxiliary helpers such as compression_ratio used in final DecodingResult calculations.
whisper/__main__.py CLI entry point that exposes --suppress_blank and --suppress_tokens flags, forwarding them to the underlying DecodingOptions.

Summary

  • Suppress blank outputs by setting suppress_blank=True (default) in DecodingOptions to prevent the model from emitting a space as the first token.
  • Suppress specific tokens by passing token IDs to suppress_tokens; use "-1" to automatically block all non-speech tokens defined in Tokenizer.non_speech_tokens.
  • Implementation location: The logic resides in whisper/decoding.py within the SuppressBlank and SuppressTokens classes, applied during each step of DecodingTask._main_loop.
  • CLI support: Use --suppress_blank and --suppress_tokens flags when running python -m whisper.

Frequently Asked Questions

What is the difference between suppress_blank and suppress_tokens?

suppress_blank is a boolean that only affects the first decoding step, preventing the model from outputting a space token (blank) at the beginning of the transcription. suppress_tokens accepts a list of token IDs (or the string "-1") that are masked to negative infinity on every decoding step, allowing you to block specific characters, punctuation, or non-speech markers throughout the entire sequence.

How do I find the token ID for a specific character or word?

Use the get_tokenizer function from the whisper module to access the tokenizer, then call encode() on your target string. For example, tokenizer.encode(",")[0] returns the integer ID for the comma token. Note that Whisper uses a Byte Pair Encoding (BPE) tokenizer, so some words may split into multiple token IDs.

Can I suppress tokens after decoding has started?

No, the suppress_tokens and suppress_blank options must be configured before decoding begins via DecodingOptions. The suppression filters are instantiated once during DecodingTask.__init__ (lines 55–60 in whisper/decoding.py) and applied consistently throughout the _main_loop. To change suppression behavior mid-stream, you would need to stop decoding and restart with new options.

Does suppressing tokens affect Whisper's performance or accuracy?

Suppressing tokens has negligible computational overhead because it simply sets specific logit values to -∞ before the softmax operation. However, it can significantly impact accuracy depending on what you suppress. Blocking common punctuation or the no_speech token may produce more continuous text but could also merge sentences incorrectly or remove important structural cues. Always validate output quality when using aggressive suppression lists.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →