internals

How the Context Builder Constructs AI Conversation Context in Open Notebook

June 7, 2026 lfnovo/open-notebook ↗

The ContextBuilder assembles a structured, token-aware context for AI interactions by aggregating sources, notebooks, and insights, then applying deduplication, priority-based sorting, and strict token-budget truncation to ensure optimal LLM payloads.

The context construction process in Open Notebook is handled by a generic, extensible ContextBuilder class located in open_notebook/utils/context_builder.py. This component serves as the central orchestrator for every AI-driven interaction—from chat to RAG to podcast generation—ensuring that language models receive precisely the right amount of relevant information without exceeding token limits.

Core Architecture and Configuration

The ContextBuilder operates through three primary data structures defined in open_notebook/utils/context_builder.py:

ContextBuilder: The main orchestrator class that accepts parameters like source_id, notebook_id, include_insights, and max_tokens
ContextConfig: A configuration object defining priority_weights (source = 100, note = 50, insight = 75) and inclusion flags
ContextItem: Individual units of context representing sources, notes, or insights with metadata like priority and token_count

During initialization (lines 65‑99), the builder stores supplied keyword arguments in self.params and creates a default ContextConfig if none is provided, automatically setting include_insights=True and establishing the priority hierarchy.

The Context Construction Pipeline

The await builder.build() method (lines 105‑138) triggers a deterministic pipeline that transforms raw domain objects into a structured context payload.

Source Context Aggregation

When a source_id is provided, _add_source_context (lines 142‑202) retrieves the Source record via Source.get and evaluates the inclusion_level parameter. The builder supports three inclusion modes:

insights: Pulls only AI-generated insights
full content: Retrieves the complete source material
not in: Excludes the source

The source itself is encapsulated as a ContextItem with type source. If include_insights is enabled, each associated Insight object is also converted to a ContextItem with type insight and assigned a priority weight of 75.

Notebook Context Expansion

For notebook_id inputs, _add_notebook_context (lines 210‑248) loads the Notebook domain object and iterates over its relationships. The method checks ContextConfig.sources to determine which sources to include (defaulting to all if unconfigured), delegating each to _add_source_context. Similarly, it processes notes according to ContextConfig.notes specifications.

Note Context Integration

Individual notes are handled by _add_note_context (lines 254‑288), which retrieves Note objects via Note.get. Like sources, notes support short and long content variants based on configuration, and are tagged as ContextItem type note with a default priority of 50.

Custom Parameter Processing

The builder supports extensibility through _process_custom_params (lines 296‑304). Any keyword argument prefixed with custom_ is logged for future extension, allowing developers to inject additional metadata without modifying core logic.

Post-Processing and Token Management

After population, the builder executes a four-stage post-processing chain:

Deduplication (remove_duplicates, lines 351‑363): Scans self.items to eliminate duplicate IDs, ensuring no source, note, or insight appears twice in the final payload.

Prioritization (prioritize, lines 315‑318): Sorts items by their priority field in descending order, ensuring high-value content (sources at priority 100) appears before lower-priority items.

Token Budget Enforcement (truncate_to_fit, lines 320‑350): If max_tokens is specified, the builder calculates cumulative token counts using token_utils.token_count and removes lowest-priority items until the total falls within budget. This guarantees LLM payloads never exceed model context windows.

Response Formatting (_format_response, lines 367‑416): Groups items by type (sources, notes, insights), calculates aggregate statistics, and returns a dictionary containing the structured content, token totals, and metadata like notebook_id.

Usage Patterns and Code Examples

The repository provides three convenience functions that wrap the ContextBuilder for common scenarios:


# Build context for an entire notebook with token limit

await build_notebook_context(
    notebook_id="notebook:12345", 
    max_tokens=4096
)


# Build context for a single source including AI insights

await build_source_context(
    source_id="source:abcde", 
    include_insights=True
)


# Build mixed context from explicit lists

await build_mixed_context(
    source_ids=["source:1", "source:2"],
    note_ids=["note:5"],
    notebook_id=None,
    max_tokens=2048,
)

These helpers are implemented at lines 22‑41 (build_notebook_context), lines 44‑60 (build_source_context), and lines 64‑95 (build_mixed_context).

Integration with the Open Notebook Ecosystem

The ContextBuilder sits at the center of the application's AI layer, consuming domain models from open_notebook/domain/notebook.py, open_notebook/domain/source.py, and open_notebook/domain/note.py. The API layer in api/context_service.py and api/routers/context.py exposes HTTP endpoints that instantiate the builder via these convenience functions, returning the formatted payload to frontend applications.

This architecture creates a clean separation between data retrieval (domain models), context optimization (the builder), and delivery (API layer), making it straightforward to extend context sources or modify budget constraints without touching downstream LLM implementations.

Summary

The ContextBuilder in open_notebook/utils/context_builder.py provides a generic, token-aware pipeline for assembling AI conversation context.
It aggregates sources (priority 100), insights (priority 75), and notes (priority 50) through domain model methods like Source.get and Notebook.get.
The pipeline includes mandatory post-processing steps: deduplication, priority sorting, and token-budget truncation to ensure LLM compatibility.
Convenience functions (build_notebook_context, build_source_context, build_mixed_context) simplify common usage patterns.
The builder supports extensibility via custom_ parameters and configurable inclusion levels for content granularity.

Frequently Asked Questions

How does the ContextBuilder enforce token limits?

The builder uses the truncate_to_fit method (lines 320‑350) to enforce token budgets. It calculates the cumulative token count of all collected items using token_utils.token_count, then iteratively removes the lowest-priority items until the total falls at or below the specified max_tokens threshold. This ensures the LLM never receives an oversized payload while preserving the highest-priority context.

What is the difference between short and long content in source context?

In _add_source_context (lines 142‑202), the builder examines the inclusion_level parameter to determine content granularity. Short content typically includes summaries or excerpts suitable for quick context, while long content provides the full source material. The specific implementation depends on the Source domain model's get_context method, which the builder calls with the appropriate size parameter.

How does the ContextBuilder handle duplicate items across different sources?

The remove_duplicates method (lines 351‑363) performs a uniqueness check on all collected ContextItem objects by their IDs. If the same source, note, or insight is referenced multiple times (for example, through both notebook and explicit source IDs), only the first occurrence is retained, ensuring the final context contains no redundant entries.

Can I extend the ContextBuilder with custom logic?

Yes, the builder includes an extension hook in _process_custom_params (lines 296‑304). Any keyword argument passed to the constructor with a custom_ prefix is captured and logged for future processing. This allows developers to inject additional metadata or specialized handling without modifying the core ContextBuilder logic or the convenience function signatures.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how lfnovo/open-notebook works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →