How the Context Builder Constructs AI Conversation Context in Open Notebook
The ContextBuilder assembles a structured, token-aware context for AI interactions by aggregating sources, notebooks, and insights, then applying deduplication, priority-based sorting, and strict token-budget truncation to ensure optimal LLM payloads.
The context construction process in Open Notebook is handled by a generic, extensible ContextBuilder class located in open_notebook/utils/context_builder.py. This component serves as the central orchestrator for every AI-driven interaction—from chat to RAG to podcast generation—ensuring that language models receive precisely the right amount of relevant information without exceeding token limits.
Core Architecture and Configuration
The ContextBuilder operates through three primary data structures defined in open_notebook/utils/context_builder.py:
ContextBuilder: The main orchestrator class that accepts parameters likesource_id,notebook_id,include_insights, andmax_tokensContextConfig: A configuration object definingpriority_weights(source = 100, note = 50, insight = 75) and inclusion flagsContextItem: Individual units of context representing sources, notes, or insights with metadata likepriorityandtoken_count
During initialization (lines 65‑99), the builder stores supplied keyword arguments in self.params and creates a default ContextConfig if none is provided, automatically setting include_insights=True and establishing the priority hierarchy.
The Context Construction Pipeline
The await builder.build() method (lines 105‑138) triggers a deterministic pipeline that transforms raw domain objects into a structured context payload.
Source Context Aggregation
When a source_id is provided, _add_source_context (lines 142‑202) retrieves the Source record via Source.get and evaluates the inclusion_level parameter. The builder supports three inclusion modes:
- insights: Pulls only AI-generated insights
- full content: Retrieves the complete source material
- not in: Excludes the source
The source itself is encapsulated as a ContextItem with type source. If include_insights is enabled, each associated Insight object is also converted to a ContextItem with type insight and assigned a priority weight of 75.
Notebook Context Expansion
For notebook_id inputs, _add_notebook_context (lines 210‑248) loads the Notebook domain object and iterates over its relationships. The method checks ContextConfig.sources to determine which sources to include (defaulting to all if unconfigured), delegating each to _add_source_context. Similarly, it processes notes according to ContextConfig.notes specifications.
Note Context Integration
Individual notes are handled by _add_note_context (lines 254‑288), which retrieves Note objects via Note.get. Like sources, notes support short and long content variants based on configuration, and are tagged as ContextItem type note with a default priority of 50.
Custom Parameter Processing
The builder supports extensibility through _process_custom_params (lines 296‑304). Any keyword argument prefixed with custom_ is logged for future extension, allowing developers to inject additional metadata without modifying core logic.
Post-Processing and Token Management
After population, the builder executes a four-stage post-processing chain:
Deduplication (remove_duplicates, lines 351‑363): Scans self.items to eliminate duplicate IDs, ensuring no source, note, or insight appears twice in the final payload.
Prioritization (prioritize, lines 315‑318): Sorts items by their priority field in descending order, ensuring high-value content (sources at priority 100) appears before lower-priority items.
Token Budget Enforcement (truncate_to_fit, lines 320‑350): If max_tokens is specified, the builder calculates cumulative token counts using token_utils.token_count and removes lowest-priority items until the total falls within budget. This guarantees LLM payloads never exceed model context windows.
Response Formatting (_format_response, lines 367‑416): Groups items by type (sources, notes, insights), calculates aggregate statistics, and returns a dictionary containing the structured content, token totals, and metadata like notebook_id.
Usage Patterns and Code Examples
The repository provides three convenience functions that wrap the ContextBuilder for common scenarios:
# Build context for an entire notebook with token limit
await build_notebook_context(
notebook_id="notebook:12345",
max_tokens=4096
)
# Build context for a single source including AI insights
await build_source_context(
source_id="source:abcde",
include_insights=True
)
# Build mixed context from explicit lists
await build_mixed_context(
source_ids=["source:1", "source:2"],
note_ids=["note:5"],
notebook_id=None,
max_tokens=2048,
)
These helpers are implemented at lines 22‑41 (build_notebook_context), lines 44‑60 (build_source_context), and lines 64‑95 (build_mixed_context).
Integration with the Open Notebook Ecosystem
The ContextBuilder sits at the center of the application's AI layer, consuming domain models from open_notebook/domain/notebook.py, open_notebook/domain/source.py, and open_notebook/domain/note.py. The API layer in api/context_service.py and api/routers/context.py exposes HTTP endpoints that instantiate the builder via these convenience functions, returning the formatted payload to frontend applications.
This architecture creates a clean separation between data retrieval (domain models), context optimization (the builder), and delivery (API layer), making it straightforward to extend context sources or modify budget constraints without touching downstream LLM implementations.
Summary
- The
ContextBuilderinopen_notebook/utils/context_builder.pyprovides a generic, token-aware pipeline for assembling AI conversation context. - It aggregates sources (priority 100), insights (priority 75), and notes (priority 50) through domain model methods like
Source.getandNotebook.get. - The pipeline includes mandatory post-processing steps: deduplication, priority sorting, and token-budget truncation to ensure LLM compatibility.
- Convenience functions (
build_notebook_context,build_source_context,build_mixed_context) simplify common usage patterns. - The builder supports extensibility via
custom_parameters and configurable inclusion levels for content granularity.
Frequently Asked Questions
How does the ContextBuilder enforce token limits?
The builder uses the truncate_to_fit method (lines 320‑350) to enforce token budgets. It calculates the cumulative token count of all collected items using token_utils.token_count, then iteratively removes the lowest-priority items until the total falls at or below the specified max_tokens threshold. This ensures the LLM never receives an oversized payload while preserving the highest-priority context.
What is the difference between short and long content in source context?
In _add_source_context (lines 142‑202), the builder examines the inclusion_level parameter to determine content granularity. Short content typically includes summaries or excerpts suitable for quick context, while long content provides the full source material. The specific implementation depends on the Source domain model's get_context method, which the builder calls with the appropriate size parameter.
How does the ContextBuilder handle duplicate items across different sources?
The remove_duplicates method (lines 351‑363) performs a uniqueness check on all collected ContextItem objects by their IDs. If the same source, note, or insight is referenced multiple times (for example, through both notebook and explicit source IDs), only the first occurrence is retained, ensuring the final context contains no redundant entries.
Can I extend the ContextBuilder with custom logic?
Yes, the builder includes an extension hook in _process_custom_params (lines 296‑304). Any keyword argument passed to the constructor with a custom_ prefix is captured and logged for future processing. This allows developers to inject additional metadata or specialized handling without modifying the core ContextBuilder logic or the convenience function signatures.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →