Using the Seed Parameter for Reproducible Outputs in OpenAI's API

The seed parameter anchors the random number generator during inference to produce deterministic outputs across identical API requests, enabling reproducible results when paired with the system_fingerprint response field to detect system changes.

OpenAI's chat completion models are nondeterministic by default, sampling tokens from probability distributions that can yield different sequences across identical prompts. The openai/openai-cookbook repository demonstrates how to leverage the seed parameter—introduced in gpt-4-1106-preview and gpt-3.5-turbo-1106 models—to eliminate sampling variance for testing, benchmarking, and controlled prompt engineering experiments.

How the Seed Parameter Works

When you supply a seed value (e.g., seed=12345), the API initializes a deterministic random number generator (RNG) for that specific request. This seeded RNG controls all stochastic decisions during token sampling, including temperature scaling, top-p nucleus sampling, and logit truncation. Consequently, the model follows an identical sampling path across repeated calls with the same seed, producing consistent token sequences.

The implementation details are documented in examples/Reproducible_outputs_with_the_seed_parameter.ipynb, which explains that while the seed eliminates sampling randomness, it cannot account for changes in model weights, inference hardware, or system updates.

Detecting System Changes with system_fingerprint

To help you identify when reproducibility may break due to external factors, the API returns a system_fingerprint field in every response. This string uniquely encodes the model snapshot, serving stack, and hardware configuration.

If the fingerprint changes between runs, you should expect potentially different completions even when using the identical seed, indicating that the underlying system environment has shifted.

Implementing Reproducible Outputs in Python

The cookbook provides concrete implementation patterns across multiple files, demonstrating both API-level seeding and local randomness control for evaluation pipelines.

Basic API Requests with Seed

The pattern shown in examples/Using_logprobs.ipynb wraps the OpenAI client to inject the seed parameter into the request payload:

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def get_completion(messages, seed=12345, **kwargs):
    """
    Wrapper that injects the `seed` parameter into a chat completion request.
    """
    payload = {
        "model": "gpt-4-1106-preview",
        "messages": messages,
        "temperature": 0.7,
        "seed": seed,
        **kwargs,
    }
    response = client.chat.completions.create(**payload)
    # The fingerprint is useful for tracking model changes

    print("system_fingerprint:", response.system_fingerprint)
    return response

# Example usage

messages = [
    {"role": "system", "content": "You are a concise poet."},
    {"role": "user", "content": "Write a haiku about sunrise."},
]

print(get_completion(messages).choices[0].message.content)

The seed argument forces the model to follow the same sampling path each time you execute the script, while system_fingerprint enables you to verify environmental consistency.

Isolating Prompt Variables

When iterating on prompts, fixing the seed allows you to attribute output differences solely to prompt changes rather than sampling variance:

prompt_a = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize the plot of 'Pride and Prejudice' in 2 sentences."},
]

prompt_b = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Give a 2‑sentence overview of 'Pride and Prejudice'."},
]

seed = 98765
resp_a = get_completion(prompt_a, seed=seed).choices[0].message.content
resp_b = get_completion(prompt_b, seed=seed).choices[0].message.content

print("A:", resp_a)
print("B:", resp_b)

Because the seed remains constant, any variation between resp_a and resp_b derives exclusively from the prompt wording differences.

Local Determinism for Evaluation Pipelines

For workflows that require reproducible synthetic data generation alongside API calls, examples/gpt-5/prompt-optimization-cookbook/scripts/topk_eval.py demonstrates seeding Python's local RNG:

import random
import json
from pathlib import Path

# Seed the Python RNG for deterministic synthetic data

random.seed(1337)

def generate_fake_corpus(num_sentences=100):
    vocab = [f"word{i}" for i in range(5000)]
    corpus = []
    for _ in range(num_sentences):
        length = random.randint(5, 15)
        sentence = " ".join(random.choice(vocab) for _ in range(length))
        corpus.append(sentence)
    return corpus

corpus = generate_fake_corpus()
Path("synthetic_corpus.json").write_text(json.dumps(corpus, indent=2))
print("Synthetic corpus generated with deterministic seed 1337")

This approach ensures that preprocessing steps, evaluation datasets, and augmentation pipelines remain consistent across experimental runs, complementing the API-level seeding strategy.

When to Use the Seed Parameter

Different development scenarios require different reproducibility strategies:

  • Unit tests and CI pipelines: Fix a constant seed (e.g., 123) to guarantee that generated prompts, summaries, or classifications remain stable across test runs, preventing flaky assertions.

  • Scientific benchmarking: Store the seed alongside experiment metadata; rerun with the exact seed to reproduce numeric results (e.g., evaluation metrics) and verify findings.

  • Prompt engineering iteration: Keep the seed constant while varying prompt templates or instructions to isolate the effect of prompt changes from sampling noise.

  • Production services: Generally avoid fixed seeds unless deterministic output is a strict requirement; allowing natural variability typically produces more diverse and robust user experiences.

Summary

  • The seed parameter, available in gpt-4-1106-preview and gpt-3.5-turbo-1106, initializes a deterministic RNG server-side to produce consistent token sequences across identical requests.

  • The system_fingerprint response field tracks the model version and serving infrastructure, alerting you to environmental changes that might affect reproducibility despite using the same seed.

  • Implementation examples in examples/Using_logprobs.ipynb demonstrate practical Python wrappers that inject seed into the chat.completions.create() payload.

  • For complete pipeline reproducibility, combine API seeding with local RNG seeding (as shown in examples/gpt-5/prompt-optimization-cookbook/scripts/topk_eval.py) to control both model outputs and synthetic data generation.

  • Fixed seeds are ideal for testing, benchmarking, and controlled experiments, but should be avoided in production scenarios where response diversity is preferred.

Frequently Asked Questions

Which OpenAI models support the seed parameter?

The seed parameter is supported in gpt-4-1106-preview and gpt-3.5-turbo-1106 and subsequent model versions. Earlier model snapshots do not expose this field, and supplying it to unsupported models will not produce deterministic results.

Why do my outputs still vary when using the same seed?

If system_fingerprint changes between requests, the underlying model weights, inference hardware, or serving stack has been updated by OpenAI, which can alter outputs despite identical seeds. Additionally, varying other parameters like temperature, top_p, or max_tokens will override seed consistency.

How does system_fingerprint differ from seed?

The seed controls the random sampling process during token generation, while system_fingerprint is a read-only identifier returned by the API that encodes the model snapshot and infrastructure version. You supply the seed; the API supplies the fingerprint.

Can I use the seed parameter in production applications?

While technically possible, fixing a seed in production is generally discouraged unless strict determinism is required (such as for regulatory compliance or specific caching strategies). Natural sampling variance typically provides better user experiences and more robust coverage of possible responses.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →