# How Content Transformations Work in Open Notebook: A LangChain Pipeline Guide

> Discover how Open Notebook uses LangChain workflows for content transformations. This guide explains how raw text becomes cleaned output via LLM processing and prompt templates.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: deep-dive
- Published: 2026-06-05

---

**Open Notebook implements content transformations as reusable LangChain workflows that take raw text, render a configurable prompt template, invoke an LLM, and return cleaned output that can optionally be stored as a source insight.**

Open Notebook treats a *content transformation* as a first-class, model-agnostic operation for processing raw text through large language models. The entire pipeline is orchestrated by a LangGraph state machine and exposed through a REST API, making it easy to run summarization, rewriting, extraction, or any custom text operation across the ecosystem. At its core, the system combines domain models, graph-based execution logic, and a thin API layer to turn prompt templates into production-ready transformations.

## Three-Layer Architecture

The implementation spans three coordinated layers: domain models that define the metadata and prompt templates, a LangGraph state machine that handles execution, and a REST API that validates requests and returns results.

### Domain Models in [`open_notebook/domain/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/transformation.py)

The foundation is built on two Pydantic models. The `Transformation` class stores the name, title, description, prompt template, and a boolean flag indicating whether it should be applied by default. The `DefaultPrompts` class holds an optional global instruction that can be prepended to every transformation prompt, giving administrators a single lever to influence all text operations.

### LangGraph Execution in [`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py)

The heavy lifting lives inside a state graph defined in [`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py). The `TransformationState` carries the input text, the selected `Transformation` object, an optional source record, and the eventual output. The graph's central node, `run_transformation`, chains together prompt resolution, template rendering, model provisioning, LLM invocation, and post-processing.

### API Router in [`api/routers/transformations.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/transformations.py)

The client-facing surface is a FastAPI router in [`api/routers/transformations.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/transformations.py). It exposes standard CRUD endpoints plus `POST /transformations/execute`, which validates that the requested transformation and model exist before handing control to the graph.

## Inside the Content Transformation Pipeline

When a transformation runs, `run_transformation` executes a strict sequence of steps. Understanding this flow is key to debugging or extending Open Notebook content transformations.

### Prompt Assembly and Rendering

First, the node resolves the final LLM prompt by combining the stored `transformation.prompt` with any global instruction from `DefaultPrompts`. The merged template is then rendered through `Prompter`, a Jinja-style engine, using the current graph state. The rendered result includes the system instructions and an `# INPUT` marker where the source text is injected.

### Model Provisioning and Invocation

A LangChain model is provisioned via `provision_langchain_model` in [`open_notebook/ai/provision.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/ai/provision.py). The caller supplies a `model_id` through graph configuration, and the provisioner instantiates an appropriate chain with a generous `max_tokens=8192` budget. The node then calls `chain.ainvoke(payload)` asynchronously to generate the raw output.

### Post-Processing and Insight Storage

After the LLM responds, the pipeline runs two cleanup routines from [`open_notebook/utils/text_utils.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/utils/text_utils.py). The function `extract_text_content` strips non-text artifacts, and `clean_thinking_content` removes internal "thinking" sections that some LangChain integrations insert, such as `Assistant: ...` prefixes. If the state included a `Source` object, the cleaned text is saved as an insight linked to that source so the UI can surface the transformed content later.

### Error Handling

Any LLM failures are caught inside the graph node and re-raised as `OpenNotebookError`, keeping the API surface consistent and making client-side error handling predictable.

## How to Run Content Transformations

You can trigger a transformation through the HTTP API or by calling the LangGraph directly from Python.

### Executing via the HTTP API

The simplest entry point is `POST /transformations/execute`, defined in [`api/routers/transformations.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/transformations.py). The endpoint validates the transformation and model IDs, invokes the graph, and returns the processed text together with provenance metadata.

```http
POST https://localhost:5055/transformations/execute
Content-Type: application/json

{
  "input_text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
  "transformation_id": "open_notebook:transformation:summarize",
  "model_id": "open_notebook:model:gpt-4"
}

```

**Response:**

```json
{
  "output": "A concise summary of the provided paragraph.",
  "transformation_id": "open_notebook:transformation:summarize",
  "model_id": "open_notebook:model:gpt-4"
}

```

### Invoking the Graph Directly from Python

For server-side scripts or custom integrations, you can bypass the HTTP layer and call the same graph logic used by the API.

```python
from open_notebook.graphs.transformation import graph as transformation_graph
from open_notebook.domain.transformation import Transformation

# Assume a Transformation record already exists in the DB

transformation = await Transformation.get("open_notebook:transformation:rewrite")

result = await transformation_graph.ainvoke(
    {
        "input_text": "Original document text …",
        "transformation": transformation,
    },
    config={"configurable": {"model_id": "open_notebook:model:claude-2"}},
)

print(result["output"])   # → transformed text

```

### Configuring the Global Default Prompt

Administrators can influence every transformation by updating the `DefaultPrompts` record. The default instructions are automatically prepended to every transformation prompt inside `run_transformation`.

```python
import httpx

async def set_global_prompt(new_instructions: str):
    async with httpx.AsyncClient(base_url="http://localhost:5055") as client:
        await client.put(
            "/transformations/default-prompt",
            json={"transformation_instructions": new_instructions},
        )

```

## Summary

- **Content transformations** in Open Notebook are reusable LangChain workflows defined by the `Transformation` model and executed through a LangGraph state machine.
- The pipeline lives in [`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py) and follows a clear sequence: assemble the prompt, render it with `Prompter`, provision an LLM via `provision_langchain_model`, invoke it, then clean the result with `extract_text_content` and `clean_thinking_content`.
- Clients trigger transformations through `POST /transformations/execute` in [`api/routers/transformations.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/transformations.py), or by calling the graph directly from Python.
- When a source document is supplied, the cleaned output is persisted as an insight linked to that source for later retrieval.

## Frequently Asked Questions

### What is a content transformation in Open Notebook?

A content transformation is a configurable, reusable LLM workflow that takes raw text and returns a processed version. It is defined by a `Transformation` record containing a prompt template and executed by a LangGraph pipeline, making it suitable for summarization, rewriting, extraction, or any custom text operation.

### How does the prompt template get rendered?

The `run_transformation` node merges the stored `transformation.prompt` with any global instructions from `DefaultPrompts`. It then passes the combined string to the `Prompter` engine, which renders the template against the current `TransformationState` and injects the source text after an `# INPUT` marker.

### What happens to the output after the LLM generates it?

The raw response is passed through `extract_text_content` to strip non-text artifacts, then through `clean_thinking_content` to remove internal reasoning prefixes. If the request included a `Source` object, the final text is stored as an insight linked to that source; otherwise it is returned directly to the caller.

### Can I use any model for content transformations?

Yes. The graph accepts a `model_id` in its configurable state and delegates provisioning to `provision_langchain_model` in [`open_notebook/ai/provision.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/ai/provision.py). As long as the model is registered in the system and supported by LangChain, the transformation pipeline will use it.