# RAG Workflow in Open Notebook's ask.py: Search and Synthesis Explained

> Explore the RAG workflow in Open Notebook's ask.py. Understand how multi-step search and sequential synthesis generate unified answers from relevant data.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: deep-dive
- Published: 2026-06-06

---

**Open Notebook's [`ask.py`](https://github.com/lfnovo/open-notebook/blob/main/ask.py) implements a search-then-synthesize RAG pipeline where a language model first plans multi-step searches, vector retrieval fetches relevant chunks, and sequential synthesis stages produce a unified final answer.**

The `lfnovo/open-notebook` repository uses this **RAG workflow** inside a **LangGraph** state machine to answer user questions with grounded, retrieved context. The entire pipeline is defined in [`open_notebook/graphs/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/ask.py) and orchestrates model-driven strategy generation, parallel vector search, and focused answer aggregation.

## How the RAG Pipeline Works in ask.py

The [`ask.py`](https://github.com/lfnovo/open-notebook/blob/main/ask.py) module compiles a linear LangGraph that transforms a user question into a polished answer through five distinct stages. Each stage is implemented as a node or conditional edge in the state graph.

### Step 1: Strategy Generation

The pipeline begins with the `agent` node calling `call_model_with_messages`. This function builds a system prompt from the `ask/entry` template and invokes `provision_langchain_model` to generate a JSON-encoded **strategy**. The strategy contains up to five search objects, each specifying a *term* and *instructions* for downstream synthesis. The parsed `Strategy` object is stored in `state["strategy"]` before the graph proceeds.

In [`open_notebook/graphs/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/ask.py), this logic occupies the entry-point node that converts natural language intent into a structured retrieval plan.

### Step 2: Parallel Retrieval with Dynamic Queries

Once the strategy is materialized, LangGraph evaluates a conditional edge from the `agent` node. For every search item in the strategy, the graph spawns a `Send` node that forwards the original question, the search term, and its instructions to the `provide_answer` sub-graph. This design executes multiple retrieval branches in parallel according to the model's own search plan.

The dynamic expansion is defined in the state-graph wiring inside [`open_notebook/graphs/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/ask.py).

### Step 3: Per-Chunk Synthesis

Each `provide_answer` node performs a **vector similarity search** by calling `vector_search` (defined in [`open_notebook/domain/notebook.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/notebook.py)) using the term supplied by the strategy. The function returns the top-k relevant documents as `results`. These raw excerpts are combined with the search-specific instructions and fed to a second language model configured with the `ask/query_process` prompt template. The model returns a concise, focused answer that is cleaned and stored as a partial result.

This step realizes the first synthesis layer: turning retrieved chunks into digestible evidence.

### Step 4: Final Answer Aggregation

After all parallel `provide_answer` nodes complete, the `write_final_answer` node collects the full thread state. It receives the original question, the chosen strategy, and the list of partial answers. A third language model call—using the `ask/final_answer` template—stitches these pieces into a single, coherent, polished response. The final output is written to `state["final_answer"]`, concluding the pipeline.

### Step 5: LangGraph State Machine Wiring

The compiled graph follows a linear chain: `START → agent → (conditional provide_answer) → write_final_answer → END`. The conditional edge after `agent` dynamically expands into as many `provide_answer` instances as the generated strategy specifies. This wiring is visible in the graph construction at the bottom of [`open_notebook/graphs/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/ask.py).

## Key Source Files and Functions

Several modules collaborate to realize the RAG workflow in [`ask.py`](https://github.com/lfnovo/open-notebook/blob/main/ask.py):

- **[`open_notebook/graphs/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/ask.py)** — Defines the core state graph, including the `agent`, `provide_answer`, and `write_final_answer` nodes.
- **[`open_notebook/domain/notebook.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/notebook.py)** — Implements `vector_search`, which executes similarity search against SurrealDB embeddings.
- **[`open_notebook/ai/provision.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/ai/provision.py)** — Exports `provision_langchain_model`, the factory that instantiates the configured LLM across providers.
- **`templates/ask/entry.jinja`** — Prompt template that instructs the model to emit a JSON search strategy.
- **`templates/ask/query_process.jinja`** — Prompt template for synthesizing a focused answer from each retrieved chunk.
- **`templates/ask/final_answer.jinja`** — Prompt template that merges partial answers into the final unified response.
- **[`api/routers/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/ask.py)** — Thin FastAPI wrapper that exposes the compiled graph via an HTTP endpoint.

## Running the RAG Workflow

You can invoke the pipeline directly from Python using the compiled `graph` object, or through the project's FastAPI server.

### Direct Graph Invocation

```python
import asyncio
from open_notebook.graphs.ask import graph

async def ask_question(q: str):
    # Invoke the compiled LangGraph with the user question

    result = await graph.ainvoke({"question": q})
    print("🧠 Final answer:", result["final_answer"])

asyncio.run(ask_question("How does quantum entanglement work?"))

```

### FastAPI Endpoint

```python
import httpx

async def ask_via_api(q: str):
    async with httpx.AsyncClient(base_url="http://localhost:5055") as client:
        resp = await client.post("/ask", json={"question": q})
        resp.raise_for_status()
        print(resp.json()["final_answer"])

# Example

import asyncio
asyncio.run(ask_via_api("Explain the benefits of RAG in AI assistants"))

```

The `/ask` route simply forwards the payload to the compiled graph. See [`api/routers/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/ask.py) for the thin wrapper implementation.

## Summary

- **[`ask.py`](https://github.com/lfnovo/open-notebook/blob/main/ask.py) implements a classic retrieval-augmented generation pipeline** inside a LangGraph state machine.
- **Strategy generation** uses an entry model to plan up to five searches, each with a term and synthesis instructions.
- **Parallel retrieval** executes vector searches via `vector_search` in [`open_notebook/domain/notebook.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/notebook.py).
- **Two-stage synthesis** first produces focused answers per chunk (`ask/query_process`), then aggregates them into a final response (`ask/final_answer`).
- **The graph** runs as a linear chain with a dynamic conditional edge that scales `provide_answer` nodes to match the strategy.

## Frequently Asked Questions

### What makes the RAG workflow in ask.py different from a single-shot retrieval system?

Unlike a single-shot approach that performs one search and answers immediately, [`ask.py`](https://github.com/lfnovo/open-notebook/blob/main/ask.py) uses a **model-generated strategy** to spawn targeted, parallel searches. Each branch retrieves and synthesizes its own evidence before a dedicated aggregation step writes the final answer, improving coverage and coherence for complex questions.

### Which prompt templates drive the three model calls in the pipeline?

The pipeline relies on three Jinja templates: `templates/ask/entry.jinja` for the initial JSON strategy, `templates/ask/query_process.jinja` for per-chunk synthesis, and `templates/ask/final_answer.jinja` for global answer aggregation. Each template is rendered at a distinct stage of the graph to control model behavior. Together they separate planning, evidence synthesis, and final reasoning into discrete prompting layers.

### How does ask.py handle parallel execution of multiple searches?

The LangGraph state machine defines a conditional edge after the `agent` node that emits a `Send` for every search defined in the strategy. LangGraph schedules these `provide_answer` nodes concurrently, and the graph pauses at `write_final_answer` until all partial results are collected.

### Where is the vector search executed in the open-notebook codebase?

The similarity search is performed by `vector_search`, implemented in [`open_notebook/domain/notebook.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/notebook.py). The `provide_answer` node in [`open_notebook/graphs/ask.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/ask.py) calls this function using the term supplied by the strategy-generated search plan.