deep-dive

How Content Transformations Work in Open Notebook: A LangChain Pipeline Guide

June 5, 2026 lfnovo/open-notebook ↗

Open Notebook implements content transformations as reusable LangChain workflows that take raw text, render a configurable prompt template, invoke an LLM, and return cleaned output that can optionally be stored as a source insight.

Open Notebook treats a content transformation as a first-class, model-agnostic operation for processing raw text through large language models. The entire pipeline is orchestrated by a LangGraph state machine and exposed through a REST API, making it easy to run summarization, rewriting, extraction, or any custom text operation across the ecosystem. At its core, the system combines domain models, graph-based execution logic, and a thin API layer to turn prompt templates into production-ready transformations.

Three-Layer Architecture

The implementation spans three coordinated layers: domain models that define the metadata and prompt templates, a LangGraph state machine that handles execution, and a REST API that validates requests and returns results.

Domain Models in `open_notebook/domain/transformation.py`

The foundation is built on two Pydantic models. The Transformation class stores the name, title, description, prompt template, and a boolean flag indicating whether it should be applied by default. The DefaultPrompts class holds an optional global instruction that can be prepended to every transformation prompt, giving administrators a single lever to influence all text operations.

LangGraph Execution in `open_notebook/graphs/transformation.py`

The heavy lifting lives inside a state graph defined in open_notebook/graphs/transformation.py. The TransformationState carries the input text, the selected Transformation object, an optional source record, and the eventual output. The graph's central node, run_transformation, chains together prompt resolution, template rendering, model provisioning, LLM invocation, and post-processing.

API Router in `api/routers/transformations.py`

The client-facing surface is a FastAPI router in api/routers/transformations.py. It exposes standard CRUD endpoints plus POST /transformations/execute, which validates that the requested transformation and model exist before handing control to the graph.

Inside the Content Transformation Pipeline

When a transformation runs, run_transformation executes a strict sequence of steps. Understanding this flow is key to debugging or extending Open Notebook content transformations.

Prompt Assembly and Rendering

First, the node resolves the final LLM prompt by combining the stored transformation.prompt with any global instruction from DefaultPrompts. The merged template is then rendered through Prompter, a Jinja-style engine, using the current graph state. The rendered result includes the system instructions and an # INPUT marker where the source text is injected.

Model Provisioning and Invocation

A LangChain model is provisioned via provision_langchain_model in open_notebook/ai/provision.py. The caller supplies a model_id through graph configuration, and the provisioner instantiates an appropriate chain with a generous max_tokens=8192 budget. The node then calls chain.ainvoke(payload) asynchronously to generate the raw output.

Post-Processing and Insight Storage

After the LLM responds, the pipeline runs two cleanup routines from open_notebook/utils/text_utils.py. The function extract_text_content strips non-text artifacts, and clean_thinking_content removes internal "thinking" sections that some LangChain integrations insert, such as Assistant: ... prefixes. If the state included a Source object, the cleaned text is saved as an insight linked to that source so the UI can surface the transformed content later.

Error Handling

Any LLM failures are caught inside the graph node and re-raised as OpenNotebookError, keeping the API surface consistent and making client-side error handling predictable.

How to Run Content Transformations

You can trigger a transformation through the HTTP API or by calling the LangGraph directly from Python.

Executing via the HTTP API

The simplest entry point is POST /transformations/execute, defined in api/routers/transformations.py. The endpoint validates the transformation and model IDs, invokes the graph, and returns the processed text together with provenance metadata.

POST https://localhost:5055/transformations/execute
Content-Type: application/json

{
  "input_text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
  "transformation_id": "open_notebook:transformation:summarize",
  "model_id": "open_notebook:model:gpt-4"
}

Response:

{
  "output": "A concise summary of the provided paragraph.",
  "transformation_id": "open_notebook:transformation:summarize",
  "model_id": "open_notebook:model:gpt-4"
}

Invoking the Graph Directly from Python

For server-side scripts or custom integrations, you can bypass the HTTP layer and call the same graph logic used by the API.

from open_notebook.graphs.transformation import graph as transformation_graph
from open_notebook.domain.transformation import Transformation

# Assume a Transformation record already exists in the DB

transformation = await Transformation.get("open_notebook:transformation:rewrite")

result = await transformation_graph.ainvoke(
    {
        "input_text": "Original document text …",
        "transformation": transformation,
    },
    config={"configurable": {"model_id": "open_notebook:model:claude-2"}},
)

print(result["output"])   # → transformed text

Configuring the Global Default Prompt

Administrators can influence every transformation by updating the DefaultPrompts record. The default instructions are automatically prepended to every transformation prompt inside run_transformation.

import httpx

async def set_global_prompt(new_instructions: str):
    async with httpx.AsyncClient(base_url="http://localhost:5055") as client:
        await client.put(
            "/transformations/default-prompt",
            json={"transformation_instructions": new_instructions},
        )

Summary

Content transformations in Open Notebook are reusable LangChain workflows defined by the Transformation model and executed through a LangGraph state machine.
The pipeline lives in open_notebook/graphs/transformation.py and follows a clear sequence: assemble the prompt, render it with Prompter, provision an LLM via provision_langchain_model, invoke it, then clean the result with extract_text_content and clean_thinking_content.
Clients trigger transformations through POST /transformations/execute in api/routers/transformations.py, or by calling the graph directly from Python.
When a source document is supplied, the cleaned output is persisted as an insight linked to that source for later retrieval.

Frequently Asked Questions

What is a content transformation in Open Notebook?

A content transformation is a configurable, reusable LLM workflow that takes raw text and returns a processed version. It is defined by a Transformation record containing a prompt template and executed by a LangGraph pipeline, making it suitable for summarization, rewriting, extraction, or any custom text operation.

How does the prompt template get rendered?

The run_transformation node merges the stored transformation.prompt with any global instructions from DefaultPrompts. It then passes the combined string to the Prompter engine, which renders the template against the current TransformationState and injects the source text after an # INPUT marker.

What happens to the output after the LLM generates it?

The raw response is passed through extract_text_content to strip non-text artifacts, then through clean_thinking_content to remove internal reasoning prefixes. If the request included a Source object, the final text is stored as an insight linked to that source; otherwise it is returned directly to the caller.

Can I use any model for content transformations?

Yes. The graph accepts a model_id in its configurable state and delegates provisioning to provision_langchain_model in open_notebook/ai/provision.py. As long as the model is registered in the system and supported by LangChain, the transformation pipeline will use it.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how lfnovo/open-notebook works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →