how-to-guide

How to Use the Responses API for Reasoning Models with Reasoning Items

March 2, 2026 openai/openai-cookbook ↗

The Responses API separates assistant messages from reasoning items in a stateful output array, allowing you to persist chain-of-thought tokens across turns by passing reasoning IDs or encrypted content via previous_response_id or explicit input context.

The OpenAI Cookbook repository provides comprehensive examples for working with reasoning models like o3 and o4-mini through the Responses API. Unlike traditional chat completions, this API exposes the model's internal reasoning process as discrete items that can be cached, encrypted, or manually threaded through multi-turn conversations to improve intelligence, reduce latency, and lower costs.

Understanding the Responses API Structure for Reasoning

When you invoke a reasoning model through client.responses.create, the returned payload contains an output array with distinct element types. This architecture enables precise control over how chain-of-thought tokens flow through your application.

The Output Array Anatomy

According to the implementation in examples/responses_api/reasoning_items.ipynb, the output array contains:

A reasoning element exposing the model's internal deliberation (via an ID or encrypted payload), identified by prefixes like rs_…
One or more message elements containing the user-facing output_text

This separation allows you to inspect or store the reasoning ID without processing the raw tokens, while the API handles the computational overhead of regenerating thought chains.

Stateful Conversation Management

The Responses API is inherently stateful. Once a reasoning item is generated, you can persist it across subsequent calls using two distinct patterns:

Automatic persistence: Pass previous_response_id referencing the prior response, and the API automatically includes all reasoning items from that turn
Manual threading: Explicitly prepend the full output array (including the reasoning element) to your next input list, which is required when inserting function results or other intermediate steps

Basic Implementation with Reasoning Items

To retrieve a reasoning item from o4-mini, initialize the client and inspect the structured output:

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.responses.create(
    model="o4-mini",
    input="Tell me a joke"
)

print(response.output)          # List containing reasoning and message items

print(response.output[0].id)    # Reasoning item ID, e.g., rs_…

As shown in lines 71-78 of reasoning_items.ipynb, the first element of response.output is the reasoning item, while subsequent elements contain the actual response content. Capture this ID to avoid recomputing the reasoning chain in future turns.

Advanced Patterns for Production Use

For complex workflows involving tool usage or compliance-sensitive data, you must explicitly manage how reasoning items traverse the conversation state.

Function Calling with Reasoning Persistence

When using function calling with reasoning models, always forward the reasoning item alongside the function output. This prevents the model from regenerating its chain-of-thought and ensures continuity in its reasoning process:

tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for provided coordinates.",
    "parameters": {
        "type": "object",
        "properties": {
            "latitude": {"type": "number"},
            "longitude": {"type": "number"}
        },
        "required": ["latitude", "longitude"],
        "additionalProperties": False
    },
    "strict": True
}]

# First turn: Model generates reasoning and decides to call the function

resp = client.responses.create(
    model="o4-mini",
    input=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
)

# Capture the full output including reasoning item

context = resp.output

# Execute the function locally

tool_call = resp.output[1]                     # The function_call item

args = json.loads(tool_call.arguments)
weather = get_weather(args["latitude"], args["longitude"])

# Append function result to context (maintaining reasoning item)

context.append({
    "type": "function_call_output",
    "call_id": tool_call.call_id,
    "output": str(weather)
})

# Second turn: Provide reasoning + function result back to model

resp2 = client.responses.create(
    model="o4-mini",
    input=context,
    tools=tools,
)

print(resp2.output_text)      # Final answer using persisted reasoning

This pattern, documented in lines 96-129 of reasoning_items.ipynb, ensures the model retains its original reasoning context when processing tool outputs, eliminating redundant token generation.

Zero-Data-Retention with Encrypted Reasoning

For compliance-sensitive workloads requiring ZDR (Zero Data Retention), the API supports encrypted reasoning items via reasoning.encrypted_content. When enabled with store=False, reasoning tokens are never persisted by OpenAI and travel only in memory during the round-trip:

resp = client.responses.create(
    model="o3",
    input=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools,
    store=False,                         # Enforced for ZDR compliance

    include=["reasoning.encrypted_content"]
)

print(resp.output[0].encrypted_content)  # Encrypted chain-of-thought

# Re-use the encrypted content in the next call

resp2 = client.responses.create(
    model="o3",
    input=resp.output,                 # Forward full output with encrypted reasoning

    tools=tools,
    store=False,
    include=["reasoning.encrypted_content"]
)

As detailed in the cookbook's safeguard guides and demonstrated in lines 126-136 of reasoning_items.ipynb, encrypted reasoning items enable sensitive use-cases while maintaining the performance benefits of cached reasoning across conversation turns.

Summary

The Responses API returns reasoning items separately from message content in the output array, accessible via IDs or encrypted payloads
Stateful persistence via previous_response_id automatically maintains reasoning context, while manual input threading is required for function-calling workflows
Function calling requires appending outputs to the original output array (including the reasoning item) to preserve the model's chain-of-thought
Encrypted reasoning items with store=False enable ZDR compliance while still allowing reasoning reuse across turns
Implementation examples are available in examples/responses_api/reasoning_items.ipynb within the openai/openai-cookbook repository

Frequently Asked Questions

What is the difference between reasoning items and message items?

Reasoning items contain the model's internal chain-of-thought tokens, exposed only via an ID or encrypted content for privacy and efficiency. Message items contain the user-facing output_text that answers the query. The Responses API returns these as separate elements in the output array to enable granular control over context management.

How do I maintain context across multiple turns?

You can maintain context by either setting previous_response_id to the prior response's ID (automatic persistence) or explicitly including the full output array (including reasoning items) in the next request's input parameter. The latter is required when you need to insert function results or modify the conversation flow between turns.

When should I use encrypted reasoning items?

Use encrypted reasoning items when processing compliance-sensitive data that requires Zero Data Retention (ZDR). Set store=False and request reasoning.encrypted_content in the include parameter. This ensures OpenAI never stores the reasoning tokens, which remain encrypted throughout the round-trip and reside only in your application's memory.

Does using reasoning items reduce API costs?

Yes. By persisting reasoning items across turns using their IDs or encrypted content, you avoid paying for the regeneration of the same chain-of-thought tokens. The API caches these reasoning computations, reducing both token costs and latency in multi-turn conversations.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how openai/openai-cookbook works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →