# LangGraph Workflow in transformation.py: Content Processing Pipeline Explained

> Explore the LangGraph workflow in transformation.py for efficient LLM-driven content processing and automatic result persistence as source insights. Understand the pipeline.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: internals
- Published: 2026-06-06

---

**The Open Notebook project implements a single-node LangGraph workflow in [`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py) that orchestrates LLM-driven content transformations and automatically persists results as source insights.**

This workflow encapsulates content-processing logic inside a reusable LangGraph state machine, enabling declarative orchestration of AI transformations within the Open Notebook knowledge management system. It handles everything from prompt construction to error classification while maintaining async-first, non-blocking execution throughout the pipeline.

## Understanding the TransformationState Schema

### State Structure and TypedDict Definition

The workflow begins with a strictly typed state container. `TransformationState` is defined as a `TypedDict` that carries all necessary context for the transformation node (lines 16-21 in [`open_notebook/graphs/transformation.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/transformation.py)):

- `input_text`: The raw text content to process
- `source`: The originating `Source` record from SurrealDB
- `transformation`: The `Transformation` definition containing the user-defined prompt
- `output`: A placeholder string that the node populates with the LLM result

This schema ensures type safety across the graph while allowing the node to access both the content and metadata needed for persistence.

## The run_transformation Node Implementation

The graph consists of a single node named `run_transformation` that handles the complete LLM interaction lifecycle.

### Input Validation and Prompt Construction

The node first validates that either a source or raw text is present (assertion on line 27), then constructs the system prompt by merging default transformation instructions with the transformation's specific prompt template (lines 34-37). It appends a "# INPUT" marker (line 38) to clearly delimit the prompt from the content to be processed.

```python

# From open_notebook/graphs/transformation.py (simplified)

system_prompt = f"{default_instructions}\n\n{transformation.prompt}\n\n# INPUT"

```

### LLM Invocation and Post-Processing

The node creates a LangChain payload combining `SystemMessage` and `HumanMessage` objects (lines 40-45), then provisions the model via `provision_langchain_model` (line 46). After invoking the model asynchronously (line 52), it applies a three-stage post-processing pipeline (lines 55-60):

1. **Extract text content** using `extract_text_content` to handle various LLM response formats
2. **Clean thinking artifacts** via `clean_thinking_content` to remove internal reasoning markers
3. **Persist insights** by calling `source.add_insight()` if a source record exists

The node returns a dictionary containing the cleaned `output` (lines 61-63).

### Error Handling Strategy

Domain-specific errors are re-raised immediately to preserve stack traces, while unexpected exceptions are captured and wrapped into user-friendly `OpenNotebookError` instances with appropriate classification (lines 64-68). This ensures callers receive actionable error messages without exposing internal implementation details.

## Graph Construction and Compilation

The workflow is assembled using LangGraph's `StateGraph` class (lines 71-74). The builder pattern registers the `run_transformation` node under the name `"agent"`, then wires the execution flow from `START` to `"agent"` and finally to `END`. The compiled graph is exported as the `graph` constant (line 75), making it available for import across the application:

```python

# From open_notebook/graphs/transformation.py

builder = StateGraph(TransformationState)
builder.add_node("agent", run_transformation)
builder.add_edge(START, "agent")
builder.add_edge("agent", END)
graph = builder.compile()

```

## Integration Points and Usage Patterns

### Invoking the Graph Directly

You can execute transformations programmatically by constructing the state manually and calling `ainvoke()`:

```python
from open_notebook.graphs.transformation import graph as transformation_graph

# Assume source and transformation are SurrealDB records

state = {
    "input_text": None,  # Graph will read source.full_text

    "source": source,
    "transformation": transformation,
    "output": "",
}

config = {"configurable": {"model_id": "gpt-4o"}}
result = await transformation_graph.ainvoke(state, config)
print(result["output"])

```

### CLI and Command Integration

The graph is also invoked through higher-level workflows such as `trigger_transformations` in [`open_notebook/graphs/source.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/source.py) and the `run_transformation` command in [`commands/source_commands.py`](https://github.com/lfnovo/open-notebook/blob/main/commands/source_commands.py). Both interfaces build the same state dictionary and pass it to the graph, storing the resulting output as an insight on the source record:

```python
from commands.source_commands import run_transformation_command

await run_transformation_command(
    input_data=RunTransformationInput(
        source_id="source:123",
        transformation_id="transformation:markdown_cleanup",
    ),
    ctx=cli_context,
)

```

## Summary

- **Single-node architecture**: The entire transformation logic resides in the `run_transformation` node, keeping the graph topology simple while handling complex LLM interactions.
- **Typed state management**: `TransformationState` ensures type safety across the async boundary between graph invocations and SurrealDB records.
- **Integrated persistence**: The graph automatically attaches transformation results to source records via `source.add_insight()`, eliminating manual persistence steps.
- **Unified error handling**: Domain errors propagate directly while generic exceptions are wrapped in `OpenNotebookError` for clean API responses.

## Frequently Asked Questions

### How does the transformation graph handle missing source records?

The `run_transformation` node asserts that either a source or raw text must be present (line 27). If neither is provided, the assertion fails immediately. However, if only `input_text` is provided without a source, the graph processes the content but skips the `source.add_insight()` persistence step (lines 59-60), returning only the transformed output.

### What model configuration options are available when invoking the graph?

The graph accepts a `config` dictionary with a `configurable` key containing `model_id`. This is passed to `provision_langchain_model` (line 46) in [`open_notebook/ai/provision.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/ai/provision.py), which instantiates the appropriate LangChain model based on the ID. If no model ID is specified, the system uses default configuration values defined in the AI provisioning module.

### How are transformation prompts composed with default instructions?

The node concatenates default transformation instructions with the user-defined prompt from the `Transformation` record, inserting a "# INPUT" delimiter (lines 34-38). This allows system-level instructions to guide the LLM's behavior while preserving the user's specific transformation intent, creating a hierarchical prompt structure without requiring template inheritance.

### Where is the compiled transformation graph used in the broader application?

The compiled `graph` is imported by [`open_notebook/graphs/source.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/graphs/source.py) where the `trigger_transformations` function invokes it as part of the source-processing pipeline. It is also accessible via [`commands/source_commands.py`](https://github.com/lfnovo/open-notebook/blob/main/commands/source_commands.py) for CLI and HTTP endpoints, enabling both automated processing during ingestion and on-demand transformations via API calls.