How to Suppress Specific Tokens or Blank Outputs During Whisper Decoding
You can suppress specific tokens or blank outputs during Whisper decoding by configuring the suppress_blank and suppress_tokens parameters in DecodingOptions, which apply logit filters to mask unwanted tokens before sampling.
OpenAI's Whisper uses a flexible logit-filter pipeline that lets you control which tokens the decoder is allowed to emit. By setting options in DecodingOptions, you can prevent blank outputs at the start of transcription or permanently block specific token IDs throughout the decoding process.
Understanding Whisper's Logit Filter Pipeline
The suppression mechanism operates inside whisper/decoding.py through four distinct stages:
-
Option Parsing – When you instantiate
DecodingOptions, the fieldssuppress_blank(defaultTrue) andsuppress_tokens(default"-1") are stored. These are defined at lines 104–108 inwhisper/decoding.py. -
Token List Resolution – The
DecodingTask._get_suppress_tokens()method (lines 15–42) converts your input into concrete token IDs. If you pass-1, the method automatically expands it to include all tokens returned byTokenizer.non_speech_tokens, while guarding against special control tokens likesotandeot. -
Filter Application – During each decoding step inside
DecodingTask.__init__(lines 55–60), the library instantiatesSuppressBlankandSuppressTokensclasses. These are appended toself.logit_filters. -
Logit Masking – In the main loop,
SuppressBlank.apply()(lines 28–31) masks the space token andeotto-∞only on the first step (tokens.shape[1] == self.sample_begin). Meanwhile,SuppressTokens.apply()(lines 34–38) masks your specified token IDs on every step. After these filters run, the decoder samples from the modified logits.
Suppressing Blank Outputs
How suppress_blank Works
When suppress_blank=True (the default), Whisper prevents the model from emitting a space character as the first token. This is handled by the SuppressBlank class in whisper/decoding.py. At the first sampling step, it forces the log-probability of the space token and the end-of-text token to negative infinity.
from whisper import decode, Whisper, DecodingOptions
model = Whisper.load_model("base")
mel = ... # your mel spectrogram input
# Default behavior: suppress_blank is True by default
result = decode(model, mel)
print(result.text) # Will never start with a space
Disabling Blank Suppression
If you need to allow leading spaces—for example, when concatenating chunks or processing partial audio—set suppress_blank=False:
options = DecodingOptions(suppress_blank=False)
result = decode(model, mel, options=options)
Suppressing Specific Tokens
Using Token IDs
To block specific characters or words, pass a list of token IDs to suppress_tokens. You can obtain these IDs using the Whisper tokenizer:
from whisper import get_tokenizer
tokenizer = get_tokenizer(multilingual=False)
comma_id = tokenizer.encode(",")[0]
period_id = tokenizer.encode(".")[0]
options = DecodingOptions(
suppress_blank=False,
suppress_tokens=[comma_id, period_id]
)
result = decode(model, mel, options=options)
Suppressing Non-Speech Tokens with -1
The most common pattern is passing "-1" (or [-1]), which automatically expands to all non-speech tokens defined in Tokenizer.non_speech_tokens. This includes special markers like <|no_speech|> and various punctuation marks:
options = DecodingOptions(
suppress_blank=True,
suppress_tokens="-1" # Expands to all non-speech tokens
)
result = decode(model, mel, options=options)
According to the source code in whisper/decoding.py (lines 15–42), when -1 is detected, the method adds the full set of non_speech_tokens while explicitly excluding critical control tokens like sot (start-of-transcript) and eot (end-of-transcript) to prevent decoding failure.
Combining Suppression Strategies
You can combine both options to fine-tune output. For example, allowing initial spaces but suppressing all non-speech tokens:
options = DecodingOptions(
suppress_blank=False, # Allow leading space
suppress_tokens="-1" # But hide <|no_speech|> and punctuation
)
result = decode(model, mel, options=options)
Key Implementation Files
The suppression logic is distributed across these critical files in the OpenAI Whisper repository:
| File | Purpose |
|---|---|
whisper/decoding.py |
Contains DecodingOptions, SuppressBlank, SuppressTokens, and DecodingTask._get_suppress_tokens() (lines 15–42, 55–60, 104–108). This is the primary implementation file. |
whisper/tokenizer.py |
Defines Tokenizer.non_speech_tokens, which provides the token list used when suppress_tokens="-1" is specified. |
whisper/utils.py |
Provides auxiliary helpers such as compression_ratio used in final DecodingResult calculations. |
whisper/__main__.py |
CLI entry point that exposes --suppress_blank and --suppress_tokens flags, forwarding them to the underlying DecodingOptions. |
Summary
- Suppress blank outputs by setting
suppress_blank=True(default) inDecodingOptionsto prevent the model from emitting a space as the first token. - Suppress specific tokens by passing token IDs to
suppress_tokens; use"-1"to automatically block all non-speech tokens defined inTokenizer.non_speech_tokens. - Implementation location: The logic resides in
whisper/decoding.pywithin theSuppressBlankandSuppressTokensclasses, applied during each step ofDecodingTask._main_loop. - CLI support: Use
--suppress_blankand--suppress_tokensflags when runningpython -m whisper.
Frequently Asked Questions
What is the difference between suppress_blank and suppress_tokens?
suppress_blank is a boolean that only affects the first decoding step, preventing the model from outputting a space token (blank) at the beginning of the transcription. suppress_tokens accepts a list of token IDs (or the string "-1") that are masked to negative infinity on every decoding step, allowing you to block specific characters, punctuation, or non-speech markers throughout the entire sequence.
How do I find the token ID for a specific character or word?
Use the get_tokenizer function from the whisper module to access the tokenizer, then call encode() on your target string. For example, tokenizer.encode(",")[0] returns the integer ID for the comma token. Note that Whisper uses a Byte Pair Encoding (BPE) tokenizer, so some words may split into multiple token IDs.
Can I suppress tokens after decoding has started?
No, the suppress_tokens and suppress_blank options must be configured before decoding begins via DecodingOptions. The suppression filters are instantiated once during DecodingTask.__init__ (lines 55–60 in whisper/decoding.py) and applied consistently throughout the _main_loop. To change suppression behavior mid-stream, you would need to stop decoding and restart with new options.
Does suppressing tokens affect Whisper's performance or accuracy?
Suppressing tokens has negligible computational overhead because it simply sets specific logit values to -∞ before the softmax operation. However, it can significantly impact accuracy depending on what you suppress. Blocking common punctuation or the no_speech token may produce more continuous text but could also merge sentences incorrectly or remove important structural cues. Always validate output quality when using aggressive suppression lists.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →