# How the Transformation Graph Applies Custom Transformations to Source Content in Open‑Notebook

> Open Notebooks transformation graph uses LangGraph to apply custom LLM transformations to source text. Clean and store output as searchable insights.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: internals
- Published: 2026-06-10

---

**Open‑Notebook leverages a dedicated LangGraph workflow—the transformation graph—to execute user‑defined LLM prompts against raw source text, automatically cleaning the output and persisting it as a searchable insight.**

Open‑Notebook is an open‑source knowledge management system that processes articles, PDFs, and video transcripts through composable AI workflows. The **transformation graph** serves as the core engine that applies **custom transformations**—user‑defined prompts—to raw text, converting unstructured source content into structured, queryable insights. This architecture separates content ingestion from enrichment, enabling parallel processing and reusable AI operations.

## The Two‑Graph Architecture

Open‑Notebook orchestrates content processing through two distinct but interconnected LangGraph workflows. The **source graph** handles ingestion and storage, while the **transformation graph** manages the actual LLM‑driven text transformations.

### Source Graph Responsibilities

The source graph, defined in [`open_notebook/graphs/source.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/source.py), manages the initial pipeline. It extracts text via `content_process`, persists records through `save_source`, and conditionally dispatches work to the transformation graph. According to the source code, this graph includes a conditional edge that only triggers transformations when the `apply_transformations` list is non‑empty.

### Transformation Graph Responsibilities

The transformation graph, compiled in [`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py) (lines 71‑76), is a dedicated subgraph that handles prompt construction, model invocation, and response cleaning. It accepts a `TransformationState` containing the raw `input_text` and a `Transformation` object, then returns cleaned output ready for persistence.

## Step‑by‑Step Execution Flow

When a user requests custom processing, the system executes a precise sequence across both graphs:

### 1. Source Ingestion and Persistence

First, the source graph extracts and stores the content. The `content_process` node extracts raw text from the uploaded file or URL, then `save_source` persists the `Source` record to the database. This ensures the transformation graph always works with stored, retrievable content.

### 2. Transformation Detection and Dispatch

The `trigger_transformations` node (lines 30‑46 in [`open_notebook/graphs/source.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/source.py)) checks the request payload for transformation IDs. If `apply_transformations` contains entries, the node creates a `Send` object for each transformation ID, targeting the `transform_content` node. This enables parallel execution of multiple transformations against the same source.

### 3. Prompt Construction and LLM Invocation

The `transform_content` node (lines 49‑67) receives both the `Source` and `Transformation` objects, then invokes the transformation graph via `await transform_graph.ainvoke(state)`. Inside the subgraph, the `run_transformation` node (lines 33‑40) merges the transformation’s `prompt` with optional default instructions, appends an `"# INPUT"` marker, and renders the final template using `Prompter`.

The system then calls `provision_langchain_model` (lines 45‑52) to create a LangChain chain for the configured model (e.g., OpenAI or Anthropic). The chain receives a `SystemMessage` containing the rendered prompt and a `HumanMessage` containing the source’s `full_text`.

### 4. Response Cleaning and Insight Storage

Raw LLM outputs often contain reasoning traces or "thinking" content. The transformation graph sanitizes these through `extract_text_content` (line 55) and `clean_thinking_content` (line 56) to isolate the actual response text.

Finally, the cleaned output is persisted via `source.add_insight(transformation.title, cleaned_content)` (line 58), making the transformed text searchable and visible in the Open‑Notebook UI. The node returns `{"output": cleaned_content}`, and the source graph aggregates results from all parallel transformations.

## Key Implementation Files

Understanding the codebase structure is essential for customization:

- **[`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py)** – Defines `TransformationState`, implements the `run_transformation` node, and compiles the transformation graph.
- **[`open_notebook/graphs/source.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/source.py)** – Orchestrates content extraction, source persistence, and conditional dispatch to the transformation graph via `trigger_transformations`.
- **[`open_notebook/domain/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/transformation.py)** – Pydantic models defining the `Transformation` object (name, prompt template, and metadata).
- **[`api/routers/transformations.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/transformations.py)** – Public REST endpoint (`execute_transformation`) that triggers the workflow.
- **[`api/transformations_service.py`](https://github.com/lfnovo/open-notebook/blob/main/api/transformations_service.py)** – Service layer converting API payloads to domain `Transformation` objects.

## Programmatic Usage Examples

You can interact with the transformation system both directly in Python and via the REST API.

### Running a Transformation Directly

To apply a transformation programmatically without the full source graph:

```python
import asyncio
from open_notebook.domain.transformation import Transformation
from open_notebook.graphs.transformation import graph as transform_graph

async def apply_custom_transformation(source_text: str, trans: Transformation):
    # Build the state expected by the transformation graph

    state = {
        "input_text": source_text,
        "transformation": trans,
        "source": None,  # optional – only needed for insight storage

    }
    # Run the graph asynchronously

    result = await transform_graph.ainvoke(state)
    return result["output"]

# Usage example

# trans = await Transformation.get("transformation:my_summarizer")

# cleaned = asyncio.run(apply_custom_transformation(raw_text, trans))

```

This bypasses the source graph and invokes `run_transformation` directly, returning the cleaned LLM output without persisting it as an insight.

### Triggering via REST API

To process a stored source through the complete pipeline:

```http
POST /transformations/execute HTTP/1.1
Content-Type: application/json

{
  "source_id": "source:123",
  "transformation_id": "transformation:summary",
  "model_id": "openai:gpt-4o"
}

```

The endpoint in [`api/routers/transformations.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/transformations.py) fetches the `Source` and `Transformation` entities, then executes the source graph, which internally invokes the transformation graph as described above.

## Summary

- **Dual‑graph architecture**: The source graph handles ingestion and dispatch, while the transformation graph manages LLM execution and cleaning.
- **Parallel processing**: Multiple transformations are dispatched simultaneously via `Send` objects created in `trigger_transformations`.
- **Prompt engineering**: The `run_transformation` node merges user prompts with input markers using the `Prompter` class before LLM invocation.
- **Output sanitization**: Built‑in cleaners remove reasoning traces via `extract_text_content` and `clean_thinking_content` before storage.
- **Insight integration**: Results are automatically attached to sources via `add_insight`, making them searchable within the Open‑Notebook interface.

## Frequently Asked Questions

### What is the difference between the source graph and the transformation graph?

The **source graph** ([`open_notebook/graphs/source.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/source.py)) manages the lifecycle of content ingestion, storage, and decision‑making about which transformations to run. The **transformation graph** ([`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py)) is a specialized subgraph that exclusively handles the execution of LLM prompts, response cleaning, and formatting. The source graph dispatches work to the transformation graph when `apply_transformations` is non‑empty.

### How does Open‑Notebook clean LLM responses before storing them?

After the LLM returns output, the transformation graph applies two cleaning functions: `extract_text_content` strips non‑text elements from the response, and `clean_thinking_content` removes internal reasoning or deliberation text (such as `<thinking>` tags). This ensures only the final, relevant content is stored via `source.add_insight`.

### Can I run a transformation without saving the result as an insight?

Yes. By invoking `transform_graph.ainvoke()` directly with a state object containing `input_text` and a `Transformation` model, you bypass the source graph’s persistence logic. Set `"source": None` in the state to process text without attaching the output to a stored source record.

### Where are transformation prompts defined and stored?

Transformation definitions are Pydantic models located in [`open_notebook/domain/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/transformation.py). Each `Transformation` object includes a `prompt` field containing the user‑defined template. These objects are stored in the database and retrieved by ID when `trigger_transformations` dispatches work to the transformation graph.