How to Use the Responses API for Reasoning Models with Reasoning Items
The Responses API separates assistant messages from reasoning items in a stateful output array, allowing you to persist chain-of-thought tokens across turns by passing reasoning IDs or encrypted content via previous_response_id or explicit input context.
The OpenAI Cookbook repository provides comprehensive examples for working with reasoning models like o3 and o4-mini through the Responses API. Unlike traditional chat completions, this API exposes the model's internal reasoning process as discrete items that can be cached, encrypted, or manually threaded through multi-turn conversations to improve intelligence, reduce latency, and lower costs.
Understanding the Responses API Structure for Reasoning
When you invoke a reasoning model through client.responses.create, the returned payload contains an output array with distinct element types. This architecture enables precise control over how chain-of-thought tokens flow through your application.
The Output Array Anatomy
According to the implementation in examples/responses_api/reasoning_items.ipynb, the output array contains:
- A
reasoningelement exposing the model's internal deliberation (via an ID or encrypted payload), identified by prefixes likers_… - One or more
messageelements containing the user-facingoutput_text
This separation allows you to inspect or store the reasoning ID without processing the raw tokens, while the API handles the computational overhead of regenerating thought chains.
Stateful Conversation Management
The Responses API is inherently stateful. Once a reasoning item is generated, you can persist it across subsequent calls using two distinct patterns:
- Automatic persistence: Pass
previous_response_idreferencing the prior response, and the API automatically includes all reasoning items from that turn - Manual threading: Explicitly prepend the full
outputarray (including the reasoning element) to your nextinputlist, which is required when inserting function results or other intermediate steps
Basic Implementation with Reasoning Items
To retrieve a reasoning item from o4-mini, initialize the client and inspect the structured output:
from openai import OpenAI
import os
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
response = client.responses.create(
model="o4-mini",
input="Tell me a joke"
)
print(response.output) # List containing reasoning and message items
print(response.output[0].id) # Reasoning item ID, e.g., rs_…
As shown in lines 71-78 of reasoning_items.ipynb, the first element of response.output is the reasoning item, while subsequent elements contain the actual response content. Capture this ID to avoid recomputing the reasoning chain in future turns.
Advanced Patterns for Production Use
For complex workflows involving tool usage or compliance-sensitive data, you must explicitly manage how reasoning items traverse the conversation state.
Function Calling with Reasoning Persistence
When using function calling with reasoning models, always forward the reasoning item alongside the function output. This prevents the model from regenerating its chain-of-thought and ensures continuity in its reasoning process:
tools = [{
"type": "function",
"name": "get_weather",
"description": "Get current temperature for provided coordinates.",
"parameters": {
"type": "object",
"properties": {
"latitude": {"type": "number"},
"longitude": {"type": "number"}
},
"required": ["latitude", "longitude"],
"additionalProperties": False
},
"strict": True
}]
# First turn: Model generates reasoning and decides to call the function
resp = client.responses.create(
model="o4-mini",
input=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools,
)
# Capture the full output including reasoning item
context = resp.output
# Execute the function locally
tool_call = resp.output[1] # The function_call item
args = json.loads(tool_call.arguments)
weather = get_weather(args["latitude"], args["longitude"])
# Append function result to context (maintaining reasoning item)
context.append({
"type": "function_call_output",
"call_id": tool_call.call_id,
"output": str(weather)
})
# Second turn: Provide reasoning + function result back to model
resp2 = client.responses.create(
model="o4-mini",
input=context,
tools=tools,
)
print(resp2.output_text) # Final answer using persisted reasoning
This pattern, documented in lines 96-129 of reasoning_items.ipynb, ensures the model retains its original reasoning context when processing tool outputs, eliminating redundant token generation.
Zero-Data-Retention with Encrypted Reasoning
For compliance-sensitive workloads requiring ZDR (Zero Data Retention), the API supports encrypted reasoning items via reasoning.encrypted_content. When enabled with store=False, reasoning tokens are never persisted by OpenAI and travel only in memory during the round-trip:
resp = client.responses.create(
model="o3",
input=[{"role": "user", "content": "Weather in Paris?"}],
tools=tools,
store=False, # Enforced for ZDR compliance
include=["reasoning.encrypted_content"]
)
print(resp.output[0].encrypted_content) # Encrypted chain-of-thought
# Re-use the encrypted content in the next call
resp2 = client.responses.create(
model="o3",
input=resp.output, # Forward full output with encrypted reasoning
tools=tools,
store=False,
include=["reasoning.encrypted_content"]
)
As detailed in the cookbook's safeguard guides and demonstrated in lines 126-136 of reasoning_items.ipynb, encrypted reasoning items enable sensitive use-cases while maintaining the performance benefits of cached reasoning across conversation turns.
Summary
- The Responses API returns reasoning items separately from message content in the
outputarray, accessible via IDs or encrypted payloads - Stateful persistence via
previous_response_idautomatically maintains reasoning context, while manual input threading is required for function-calling workflows - Function calling requires appending outputs to the original
outputarray (including the reasoning item) to preserve the model's chain-of-thought - Encrypted reasoning items with
store=Falseenable ZDR compliance while still allowing reasoning reuse across turns - Implementation examples are available in
examples/responses_api/reasoning_items.ipynbwithin the openai/openai-cookbook repository
Frequently Asked Questions
What is the difference between reasoning items and message items?
Reasoning items contain the model's internal chain-of-thought tokens, exposed only via an ID or encrypted content for privacy and efficiency. Message items contain the user-facing output_text that answers the query. The Responses API returns these as separate elements in the output array to enable granular control over context management.
How do I maintain context across multiple turns?
You can maintain context by either setting previous_response_id to the prior response's ID (automatic persistence) or explicitly including the full output array (including reasoning items) in the next request's input parameter. The latter is required when you need to insert function results or modify the conversation flow between turns.
When should I use encrypted reasoning items?
Use encrypted reasoning items when processing compliance-sensitive data that requires Zero Data Retention (ZDR). Set store=False and request reasoning.encrypted_content in the include parameter. This ensures OpenAI never stores the reasoning tokens, which remain encrypted throughout the round-trip and reside only in your application's memory.
Does using reasoning items reduce API costs?
Yes. By persisting reasoning items across turns using their IDs or encrypted content, you avoid paying for the regeneration of the same chain-of-thought tokens. The API caches these reasoning computations, reducing both token costs and latency in multi-turn conversations.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →