# How WatermarkingConfig Enables AI-Generated Text Detection in Transformers

> Discover how WatermarkingConfig in Hugging Face Transformers detects AI-generated text. Learn about statistical watermarks and deterministic green-list hashing for text verification.

- Repository: [Hugging Face/transformers](https://github.com/huggingface/transformers)
- Tags: deep-dive
- Published: 2026-02-22

---

**`WatermarkingConfig` acts as a unified configuration object that drives the `WatermarkLogitsProcessor` to embed statistical watermarks during text generation and the `WatermarkDetector` to verify them using deterministic green-list hashing based on context tokens.**

The `WatermarkingConfig` class, defined in [`src/transformers/generation/configuration_utils.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/generation/configuration_utils.py), serves as the central contract between generation and detection in the Hugging Face Transformers library. This configuration encapsulates the hyperparameters required to deterministically generate "green lists" of favored tokens during generation, which the detector later reconstructs to compute statistical confidence scores. By sharing the exact same `WatermarkingConfig` instance across both phases, the system guarantees that detection perfectly mirrors the generation-time watermarking logic.

## WatermarkingConfig Parameters and Validation

The `WatermarkingConfig` dataclass stores six critical parameters that control both the strength and detectability of the watermark.

```python
from transformers import WatermarkingConfig

wm_config = WatermarkingConfig(
    greenlist_ratio=0.25,
    bias=2.0,
    hashing_key=15485863,
    seeding_scheme="lefthash",
    context_width=1,
)

```

- **`greenlist_ratio`**: The fraction of the vocabulary designated as "green" tokens for each context. Higher values increase detection reliability but may reduce text quality.
- **`bias`**: The additive logit boost applied exclusively to green tokens during sampling. This value controls watermark strength without altering the model's base distribution for non-green tokens.
- **`hashing_key`**: A prime-number seed that initializes the deterministic hash function, ensuring reproducible green lists across different runs.
- **`seeding_scheme`**: The algorithm for computing hashes:
  - `"lefthash"`: Uses the previous token(s) as hash input (Algorithm 2 from the original paper).
  - `"selfhash"`: Uses the current token itself (Algorithm 3, computationally slower).
- **`context_width`**: The number of preceding tokens fed into the hash function. Increasing this value improves robustness against paraphrasing attacks at the cost of computational overhead.

The configuration validates these parameters via the `validate()` method and constructs the generation processor through `construct_processor()`, which instantiates `WatermarkLogitsProcessor` with the specified hyperparameters.

## Generation-Time Watermarking with WatermarkLogitsProcessor

During text generation, the `WatermarkLogitsProcessor`—located in [`src/transformers/generation/logits_process.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/generation/logits_process.py)—modifies model outputs to favor green-list tokens without changing the underlying model weights.

For each generation step, the processor executes three operations:

1. **Green List Derivation**: Using the `seeding_scheme` and `context_width`, it hashes the recent context tokens with the `hashing_key` to deterministically select which vocabulary indices belong to the current green list.
2. **Logit Biasing**: It adds the `bias` value (default 2.0) to the logits of all green-list tokens before the softmax operation.
3. **Sampling**: The model samples from the modified distribution, producing text that statistically over-represents green tokens.

Because the hash function depends only on the configuration parameters and the local context, the same `WatermarkingConfig` can regenerate identical green lists during detection, enabling verification without access to the original model outputs.

## Detection with WatermarkDetector

The `WatermarkDetector` class in [`src/transformers/generation/watermarking.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/generation/watermarking.py) performs statistical hypothesis testing to determine whether a given text contains the watermark.

The detector re-initializes the `WatermarkLogitsProcessor` using the same `WatermarkingConfig` to ensure perfect alignment with the generation-time green lists. The detection algorithm proceeds as follows:

1. **N-gram Extraction**: The detector slides a window across the input token sequence, extracting n-grams of length `context_width + 1`.
2. **Green Token Scoring**: For each n-gram, it computes the green list for the prefix (all tokens except the last) and checks whether the target token appears in that list via `_get_ngram_score`.
3. **Statistical Aggregation**: It counts the total number of green tokens and computes the `green_fraction` (observed green rate). Using the expected rate (`greenlist_ratio`) and the number of scored tokens, it calculates a **z-score** measuring how many standard deviations the observed count deviates from the null hypothesis.
4. **Decision**: The detector returns a `WatermarkDetectorOutput` containing the z-score, p-value, binary prediction (whether `z_score > threshold`), and confidence metrics.

The detection is robust to minor edits because the `context_width` parameter allows the hash to depend on multiple preceding tokens, making the watermark resistant to synonym substitution or minor paraphrasing.

## Complete Implementation Example

The following example demonstrates the end-to-end workflow using `WatermarkingConfig` for both generation and detection:

```python
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    WatermarkingConfig,
    WatermarkDetector,
)

# Initialize model and tokenizer

model_id = "openai-community/gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Configure watermarking parameters

wm_config = WatermarkingConfig(
    greenlist_ratio=0.25,
    bias=2.5,
    seeding_scheme="selfhash",
    context_width=1,
)

# Generate watermarked text

inputs = tokenizer(["The secret is"], return_tensors="pt")
output_ids = model.generate(
    **inputs,
    watermarking_config=wm_config,
    do_sample=False,
    max_length=30,
)
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(f"Generated: {generated_text}")

# Detect the watermark using the identical configuration

detector = WatermarkDetector(
    model_config=model.config,
    device="cpu",
    watermarking_config=wm_config,
)
result = detector(output_ids, return_dict=True)

print(f"Green fraction: {result.green_fraction.mean():.3f}")
print(f"Z-score: {result.z_score:.3f}")
print(f"Prediction: {result.prediction}")  # True if watermarked

```

## Summary

- **`WatermarkingConfig`** serves as the single source of truth for both generation and detection, ensuring perfect alignment of green-list selection through deterministic hashing.
- **Generation** uses `WatermarkLogitsProcessor` to bias logits toward green-list tokens based on context hashes, embedding an invisible statistical signal without modifying model weights.
- **Detection** employs `WatermarkDetector` to reconstruct the same green lists, compute green-token fractions, and derive z-scores and p-values to determine if text is AI-generated.
- The system relies on shared hyperparameters—`hashing_key`, `seeding_scheme`, and `context_width`—to ensure that detection reproduces the exact generation-time conditions required for verification.

## Frequently Asked Questions

### What happens if I use different WatermarkingConfig settings for detection than for generation?

Detection will fail to reconstruct the correct green lists, causing the z-score to drop to chance levels (approximately 0) and the prediction to return `False`. The `hashing_key`, `greenlist_ratio`, `seeding_scheme`, and `context_width` must be identical between generation and detection to ensure the deterministic hash function produces matching green lists.

### How does the bias parameter affect text quality and detection accuracy?

The `bias` parameter controls the strength of the watermark by adding a constant value to the logits of green-list tokens. Higher values (e.g., 3.0-4.0) make the watermark easier to detect (higher z-scores) but may distort the model's natural output distribution, potentially reducing coherence. Lower values (e.g., 1.0-1.5) preserve text quality but require longer sequences for reliable detection.

### Can the watermark survive paraphrasing or minor text edits?

Yes, the watermark demonstrates robustness to minor edits when using a `context_width` greater than 1 or the `selfhash` seeding scheme. Because the green list for each token depends on multiple preceding tokens (or the token itself), isolated synonym substitutions or insertions only affect a limited number of n-gram scores. However, extensive rewriting or truncation of the beginning of the sequence will degrade detection performance.

### What is the difference between lefthash and selfhash seeding schemes?

The `lefthash` scheme (Algorithm 2) hashes the previous token(s) to determine the green list for the current position, making it computationally efficient and suitable for standard autoregressive generation. The `selfhash` scheme (Algorithm 3) incorporates the current token itself into the hash computation, which provides stronger theoretical guarantees against certain attacks but requires more computation since the green list cannot be pre-computed before sampling.