How the Transformation Graph Applies Custom Transformations to Source Content in Open‑Notebook

Open‑Notebook leverages a dedicated LangGraph workflow—the transformation graph—to execute user‑defined LLM prompts against raw source text, automatically cleaning the output and persisting it as a searchable insight.

Open‑Notebook is an open‑source knowledge management system that processes articles, PDFs, and video transcripts through composable AI workflows. The transformation graph serves as the core engine that applies custom transformations—user‑defined prompts—to raw text, converting unstructured source content into structured, queryable insights. This architecture separates content ingestion from enrichment, enabling parallel processing and reusable AI operations.

The Two‑Graph Architecture

Open‑Notebook orchestrates content processing through two distinct but interconnected LangGraph workflows. The source graph handles ingestion and storage, while the transformation graph manages the actual LLM‑driven text transformations.

Source Graph Responsibilities

The source graph, defined in open_notebook/graphs/source.py, manages the initial pipeline. It extracts text via content_process, persists records through save_source, and conditionally dispatches work to the transformation graph. According to the source code, this graph includes a conditional edge that only triggers transformations when the apply_transformations list is non‑empty.

Transformation Graph Responsibilities

The transformation graph, compiled in open_notebook/graphs/transformation.py (lines 71‑76), is a dedicated subgraph that handles prompt construction, model invocation, and response cleaning. It accepts a TransformationState containing the raw input_text and a Transformation object, then returns cleaned output ready for persistence.

Step‑by‑Step Execution Flow

When a user requests custom processing, the system executes a precise sequence across both graphs:

1. Source Ingestion and Persistence

First, the source graph extracts and stores the content. The content_process node extracts raw text from the uploaded file or URL, then save_source persists the Source record to the database. This ensures the transformation graph always works with stored, retrievable content.

2. Transformation Detection and Dispatch

The trigger_transformations node (lines 30‑46 in open_notebook/graphs/source.py) checks the request payload for transformation IDs. If apply_transformations contains entries, the node creates a Send object for each transformation ID, targeting the transform_content node. This enables parallel execution of multiple transformations against the same source.

3. Prompt Construction and LLM Invocation

The transform_content node (lines 49‑67) receives both the Source and Transformation objects, then invokes the transformation graph via await transform_graph.ainvoke(state). Inside the subgraph, the run_transformation node (lines 33‑40) merges the transformation’s prompt with optional default instructions, appends an "# INPUT" marker, and renders the final template using Prompter.

The system then calls provision_langchain_model (lines 45‑52) to create a LangChain chain for the configured model (e.g., OpenAI or Anthropic). The chain receives a SystemMessage containing the rendered prompt and a HumanMessage containing the source’s full_text.

4. Response Cleaning and Insight Storage

Raw LLM outputs often contain reasoning traces or "thinking" content. The transformation graph sanitizes these through extract_text_content (line 55) and clean_thinking_content (line 56) to isolate the actual response text.

Finally, the cleaned output is persisted via source.add_insight(transformation.title, cleaned_content) (line 58), making the transformed text searchable and visible in the Open‑Notebook UI. The node returns {"output": cleaned_content}, and the source graph aggregates results from all parallel transformations.

Key Implementation Files

Understanding the codebase structure is essential for customization:

Programmatic Usage Examples

You can interact with the transformation system both directly in Python and via the REST API.

Running a Transformation Directly

To apply a transformation programmatically without the full source graph:

import asyncio
from open_notebook.domain.transformation import Transformation
from open_notebook.graphs.transformation import graph as transform_graph

async def apply_custom_transformation(source_text: str, trans: Transformation):
    # Build the state expected by the transformation graph

    state = {
        "input_text": source_text,
        "transformation": trans,
        "source": None,  # optional – only needed for insight storage

    }
    # Run the graph asynchronously

    result = await transform_graph.ainvoke(state)
    return result["output"]

# Usage example

# trans = await Transformation.get("transformation:my_summarizer")

# cleaned = asyncio.run(apply_custom_transformation(raw_text, trans))

This bypasses the source graph and invokes run_transformation directly, returning the cleaned LLM output without persisting it as an insight.

Triggering via REST API

To process a stored source through the complete pipeline:

POST /transformations/execute HTTP/1.1
Content-Type: application/json

{
  "source_id": "source:123",
  "transformation_id": "transformation:summary",
  "model_id": "openai:gpt-4o"
}

The endpoint in api/routers/transformations.py fetches the Source and Transformation entities, then executes the source graph, which internally invokes the transformation graph as described above.

Summary

  • Dual‑graph architecture: The source graph handles ingestion and dispatch, while the transformation graph manages LLM execution and cleaning.
  • Parallel processing: Multiple transformations are dispatched simultaneously via Send objects created in trigger_transformations.
  • Prompt engineering: The run_transformation node merges user prompts with input markers using the Prompter class before LLM invocation.
  • Output sanitization: Built‑in cleaners remove reasoning traces via extract_text_content and clean_thinking_content before storage.
  • Insight integration: Results are automatically attached to sources via add_insight, making them searchable within the Open‑Notebook interface.

Frequently Asked Questions

What is the difference between the source graph and the transformation graph?

The source graph (open_notebook/graphs/source.py) manages the lifecycle of content ingestion, storage, and decision‑making about which transformations to run. The transformation graph (open_notebook/graphs/transformation.py) is a specialized subgraph that exclusively handles the execution of LLM prompts, response cleaning, and formatting. The source graph dispatches work to the transformation graph when apply_transformations is non‑empty.

How does Open‑Notebook clean LLM responses before storing them?

After the LLM returns output, the transformation graph applies two cleaning functions: extract_text_content strips non‑text elements from the response, and clean_thinking_content removes internal reasoning or deliberation text (such as <thinking> tags). This ensures only the final, relevant content is stored via source.add_insight.

Can I run a transformation without saving the result as an insight?

Yes. By invoking transform_graph.ainvoke() directly with a state object containing input_text and a Transformation model, you bypass the source graph’s persistence logic. Set "source": None in the state to process text without attaching the output to a stored source record.

Where are transformation prompts defined and stored?

Transformation definitions are Pydantic models located in open_notebook/domain/transformation.py. Each Transformation object includes a prompt field containing the user‑defined template. These objects are stored in the database and retrieved by ID when trigger_transformations dispatches work to the transformation graph.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →