LangGraph Workflow in transformation.py: Content Processing Pipeline Explained

The Open Notebook project implements a single-node LangGraph workflow in open_notebook/graphs/transformation.py that orchestrates LLM-driven content transformations and automatically persists results as source insights.

This workflow encapsulates content-processing logic inside a reusable LangGraph state machine, enabling declarative orchestration of AI transformations within the Open Notebook knowledge management system. It handles everything from prompt construction to error classification while maintaining async-first, non-blocking execution throughout the pipeline.

Understanding the TransformationState Schema

State Structure and TypedDict Definition

The workflow begins with a strictly typed state container. TransformationState is defined as a TypedDict that carries all necessary context for the transformation node (lines 16-21 in open_notebook/graphs/transformation.py):

  • input_text: The raw text content to process
  • source: The originating Source record from SurrealDB
  • transformation: The Transformation definition containing the user-defined prompt
  • output: A placeholder string that the node populates with the LLM result

This schema ensures type safety across the graph while allowing the node to access both the content and metadata needed for persistence.

The run_transformation Node Implementation

The graph consists of a single node named run_transformation that handles the complete LLM interaction lifecycle.

Input Validation and Prompt Construction

The node first validates that either a source or raw text is present (assertion on line 27), then constructs the system prompt by merging default transformation instructions with the transformation's specific prompt template (lines 34-37). It appends a "# INPUT" marker (line 38) to clearly delimit the prompt from the content to be processed.


# From open_notebook/graphs/transformation.py (simplified)

system_prompt = f"{default_instructions}\n\n{transformation.prompt}\n\n# INPUT"

LLM Invocation and Post-Processing

The node creates a LangChain payload combining SystemMessage and HumanMessage objects (lines 40-45), then provisions the model via provision_langchain_model (line 46). After invoking the model asynchronously (line 52), it applies a three-stage post-processing pipeline (lines 55-60):

  1. Extract text content using extract_text_content to handle various LLM response formats
  2. Clean thinking artifacts via clean_thinking_content to remove internal reasoning markers
  3. Persist insights by calling source.add_insight() if a source record exists

The node returns a dictionary containing the cleaned output (lines 61-63).

Error Handling Strategy

Domain-specific errors are re-raised immediately to preserve stack traces, while unexpected exceptions are captured and wrapped into user-friendly OpenNotebookError instances with appropriate classification (lines 64-68). This ensures callers receive actionable error messages without exposing internal implementation details.

Graph Construction and Compilation

The workflow is assembled using LangGraph's StateGraph class (lines 71-74). The builder pattern registers the run_transformation node under the name "agent", then wires the execution flow from START to "agent" and finally to END. The compiled graph is exported as the graph constant (line 75), making it available for import across the application:


# From open_notebook/graphs/transformation.py

builder = StateGraph(TransformationState)
builder.add_node("agent", run_transformation)
builder.add_edge(START, "agent")
builder.add_edge("agent", END)
graph = builder.compile()

Integration Points and Usage Patterns

Invoking the Graph Directly

You can execute transformations programmatically by constructing the state manually and calling ainvoke():

from open_notebook.graphs.transformation import graph as transformation_graph

# Assume source and transformation are SurrealDB records

state = {
    "input_text": None,  # Graph will read source.full_text

    "source": source,
    "transformation": transformation,
    "output": "",
}

config = {"configurable": {"model_id": "gpt-4o"}}
result = await transformation_graph.ainvoke(state, config)
print(result["output"])

CLI and Command Integration

The graph is also invoked through higher-level workflows such as trigger_transformations in open_notebook/graphs/source.py and the run_transformation command in commands/source_commands.py. Both interfaces build the same state dictionary and pass it to the graph, storing the resulting output as an insight on the source record:

from commands.source_commands import run_transformation_command

await run_transformation_command(
    input_data=RunTransformationInput(
        source_id="source:123",
        transformation_id="transformation:markdown_cleanup",
    ),
    ctx=cli_context,
)

Summary

  • Single-node architecture: The entire transformation logic resides in the run_transformation node, keeping the graph topology simple while handling complex LLM interactions.
  • Typed state management: TransformationState ensures type safety across the async boundary between graph invocations and SurrealDB records.
  • Integrated persistence: The graph automatically attaches transformation results to source records via source.add_insight(), eliminating manual persistence steps.
  • Unified error handling: Domain errors propagate directly while generic exceptions are wrapped in OpenNotebookError for clean API responses.

Frequently Asked Questions

How does the transformation graph handle missing source records?

The run_transformation node asserts that either a source or raw text must be present (line 27). If neither is provided, the assertion fails immediately. However, if only input_text is provided without a source, the graph processes the content but skips the source.add_insight() persistence step (lines 59-60), returning only the transformed output.

What model configuration options are available when invoking the graph?

The graph accepts a config dictionary with a configurable key containing model_id. This is passed to provision_langchain_model (line 46) in open_notebook/ai/provision.py, which instantiates the appropriate LangChain model based on the ID. If no model ID is specified, the system uses default configuration values defined in the AI provisioning module.

How are transformation prompts composed with default instructions?

The node concatenates default transformation instructions with the user-defined prompt from the Transformation record, inserting a "# INPUT" delimiter (lines 34-38). This allows system-level instructions to guide the LLM's behavior while preserving the user's specific transformation intent, creating a hierarchical prompt structure without requiring template inheritance.

Where is the compiled transformation graph used in the broader application?

The compiled graph is imported by open_notebook/graphs/source.py where the trigger_transformations function invokes it as part of the source-processing pipeline. It is also accessible via commands/source_commands.py for CLI and HTTP endpoints, enabling both automated processing during ingestion and on-demand transformations via API calls.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →