# Implementing Multi-Tool Orchestration with RAG Using the OpenAI Responses API

> Implement multi-tool orchestration with RAG to dynamically route queries across vector stores, web search, and file indexes for advanced information retrieval. Learn more.

- Repository: [OpenAI/openai-cookbook](https://github.com/openai/openai-cookbook)
- Tags: how-to-guide
- Published: 2026-03-02

---

**Implementing multi-tool orchestration with RAG enables large language models to dynamically route queries to vector stores, web search, or internal file indexes, executing sequential retrieval steps before synthesizing a final answer.**

The openai/openai-cookbook repository demonstrates a complete implementation of this pattern, combining the **Responses API** with multiple retrieval tools to build agents that autonomously decide which knowledge sources to query. This approach eliminates brittle conditional logic by letting the model itself determine when to retrieve context, which tool to use, and how to chain multiple retrieval operations.

## Architecture of Dynamic Multi-Tool RAG

The cookbook implements a **model-driven routing** architecture where the LLM acts as the orchestrator. Instead of hard-coding retrieval paths, you provide a catalog of tools and instructions, allowing the model to select the optimal retrieval strategy for each query.

### The Responses API as the Central Router

At the core of the implementation is the `openai.responses.create()` method, invoked in `examples/responses_api/responses_api_tool_orchestration.ipynb`. This endpoint accepts a `tools` list containing JSON schema definitions for each available function and an `instructions` parameter that guides the model's routing decisions. The API handles the conversation state, detects when tool execution is required, and returns structured `tool_call` payloads that your application executes.

### Built-in vs. Custom Retrieval Tools

The implementation combines three distinct retrieval mechanisms:

- **`file_search`**: A built-in tool that performs vector search over an internal document index (backed by Pinecone in the cookbook examples) without requiring custom code.
- **`search_vector_store`**: A custom Python function that queries an external vector database (e.g., Pinecone) for domain-specific documents, registered via a JSON schema in the tools list.
- **`web_search`**: A custom function implementing Google Custom Search or similar, enabling the model to fetch up-to-date public information when internal knowledge bases are insufficient.

### Sequential Tool Execution Flow

The orchestration loop supports **multiple sequential tool calls**. The model can request a web search, examine the results, then decide to query a vector store for additional context, or vice versa. Your application inspects the `tool_calls` array in the response, executes each function, appends the results as `tool` messages to the conversation history, and resubmits to the API until the model returns a final `assistant` message.

## Python Implementation of the Orchestration Loop

The following implementation mirrors the pattern found in `examples/responses_api/responses_api_tool_orchestration.ipynb`, demonstrating how to wire custom retrieval functions into the Responses API.

```python
import openai
import os

# ----------------------------------------------------

# 1️⃣  Tool definitions (JSON-schema style)

# ----------------------------------------------------

def search_vector_store(query: str) -> str:
    """
    Retrieve relevant chunks from an external Pinecone index.
    Returns a JSON string with a list of `documents`.
    """
    # Insert your Pinecone client code here

    return f'{{"documents": ["Result 1 for {query}", "Result 2 for {query}"]}}'

def web_search(query: str) -> str:
    """Perform a Google Custom Search and return the top result."""
    # Insert your Google-CSE request here

    return f'{{"title": "Example", "snippet": "Result for {query}"}}'

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_vector_store",
            "description": "Search a domain-specific vector DB (e.g., Pinecone).",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Run a live web search for up-to-date information.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
]

# ----------------------------------------------------

# 2️⃣  Orchestration loop

# ----------------------------------------------------

def run_multi_tool_rag(user_prompt: str):
    messages = [{"role": "user", "content": user_prompt}]
    while True:
        response = openai.responses.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=tools,
            instructions=(
                "Use `file_search` for internal docs, "
                "`search_vector_store` for domain knowledge, "
                "and `web_search` for fresh public info. "
                "If multiple tools are needed, call them sequentially."
            ),
        )

        # Final answer from the model → break the loop

        if response.choices[0].message.role == "assistant":
            return response.choices[0].message.content

        # Otherwise we got a tool call

        tool_call = response.choices[0].message.tool_calls[0]
        func_name = tool_call.function.name
        args = eval(tool_call.function.arguments)  # Schema is controlled

        # Dispatch to the real Python implementation

        if func_name == "search_vector_store":
            result = search_vector_store(**args)
        elif func_name == "web_search":
            result = web_search(**args)
        else:  # Built-in file_search is handled by the API itself

            result = None

        # Append tool result so the model can see it

        messages.append(
            {
                "role": "tool",
                "content": result,
                "tool_call_id": tool_call.id,
                "name": func_name,
            }
        )

# ----------------------------------------------------

# 3️⃣  Example usage

# ----------------------------------------------------

if __name__ == "__main__":
    os.environ["OPENAI_API_KEY"] = "sk-..."
    query = "What are the latest pricing changes for Azure OpenAI?"
    answer = run_multi_tool_rag(query)
    print("\nFinal Answer:\n", answer)

```

### Defining Tool Schemas for RAG

Each retrieval tool requires a JSON schema definition in the `tools` list. The schema for `search_vector_store` specifies a single `query` string parameter, enabling the model to extract appropriate search terms from the user's intent. This declarative approach allows the LLM to understand tool capabilities without exposure to implementation details.

### Handling Tool Call Sequences

The `while True` loop implements the orchestration logic. When the API returns a `tool_calls` payload, the code extracts the function name and arguments, dispatches to the corresponding Python implementation, and appends the result to the `messages` array with role `tool`. This pattern supports **chained retrieval**: the model can issue a second tool call based on evidence gathered from the first, creating complex multi-source RAG workflows.

## Source Files and Production Patterns

The cookbook provides reference implementations and deployment patterns beyond the basic notebook.

### Primary Notebook Reference

The file **`examples/responses_api/responses_api_tool_orchestration.ipynb`** contains the complete walkthrough, including Pinecone integration setup, exact JSON schemas for each tool, and interactive examples of the orchestration loop. The notebook is registered in [`registry.yaml`](https://github.com/openai/openai-cookbook/blob/main/registry.yaml) (lines 680-686) under the entry "Multi-Tool Orchestration with RAG approach using OpenAI's Responses API."

### MCP Micro-Service Deployment Pattern

For production scaling, the cookbook demonstrates exposing RAG tools as microservices using the **Model Context Protocol (MCP)**. The file **[`examples/partners/mcp_powered_voice_agents/search_server.py`](https://github.com/openai/openai-cookbook/blob/main/examples/partners/mcp_powered_voice_agents/search_server.py)** shows how to wrap `search_vector_store` and `web_search` as independent MCP servers, enabling horizontal scaling, independent versioning of retrieval components, and integration with voice agents. This pattern decouples the retrieval infrastructure from the main orchestration logic.

### Supporting Documentation

- **[`examples/vector_databases/pinecone/README.md`](https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/pinecone/README.md)**: Provides helper functions for Pinecone connectivity used by the `search_vector_store` implementation.
- **[`AGENTS.md`](https://github.com/openai/openai-cookbook/blob/main/AGENTS.md)**: Contains general architectural guidance for building agentic systems that the multi-tool orchestration pattern extends.

## Summary

- **Model-driven routing** eliminates hard-coded if/else logic by letting the LLM select between `file_search`, vector stores, and web search based on query context.
- **Sequential execution** enables complex RAG workflows where the model chains multiple retrieval operations before generating an answer.
- **Schema-based registration** allows you to add new retrieval sources by defining JSON schemas and Python functions, without modifying the core orchestration loop.
- **Production scalability** is achieved through MCP-based microservice patterns demonstrated in [`search_server.py`](https://github.com/openai/openai-cookbook/blob/main/search_server.py), allowing independent scaling of vector search and web search components.

## Frequently Asked Questions

### What is multi-tool orchestration in RAG systems?

**Multi-tool orchestration** is an architectural pattern where the language model autonomously selects and sequences multiple retrieval tools—such as vector databases, web search, and file indexes—to gather evidence before generating a response. Unlike static RAG pipelines, this approach allows the model to decide which knowledge sources are relevant for each specific query.

### How does the Responses API handle multiple sequential tool calls?

The Responses API supports **automatic multi-turn tool execution** through its conversation state management. When you submit a request with a `tools` list, the API can return a `tool_calls` payload instead of a final answer. Your application executes these functions, appends the results as `tool` messages, and sends the updated conversation back to the API. The model can then request additional tools based on the new context, repeating until it produces a final `assistant` response.

### Can I combine the built-in file_search tool with custom vector databases?

Yes. The implementation in `examples/responses_api/responses_api_tool_orchestration.ipynb` demonstrates combining the built-in `file_search` tool (for internal document indexes) with custom functions like `search_vector_store` (for external Pinecone indexes) and `web_search`. You include all available tools in the `tools` array and use the `instructions` parameter to guide the model on when to use each source.

### Where can I find the reference implementation for production deployment?

The primary reference is **`examples/responses_api/responses_api_tool_orchestration.ipynb`** in the openai-cookbook repository. For production deployment patterns using microservices, examine **[`examples/partners/mcp_powered_voice_agents/search_server.py`](https://github.com/openai/openai-cookbook/blob/main/examples/partners/mcp_powered_voice_agents/search_server.py)**, which demonstrates how to expose these tools as MCP-compatible servers for scalable, decoupled architectures.