# How the Context Builder Constructs AI Conversation Context in Open Notebook

> Learn how the Context Builder constructs AI conversation context in Open Notebook. Discover its methods for aggregating sources, deduplication, and token-budget truncation for optimal LLM payloads.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: internals
- Published: 2026-06-07

---

**The ContextBuilder assembles a structured, token-aware context for AI interactions by aggregating sources, notebooks, and insights, then applying deduplication, priority-based sorting, and strict token-budget truncation to ensure optimal LLM payloads.**

The context construction process in Open Notebook is handled by a generic, extensible `ContextBuilder` class located in [`open_notebook/utils/context_builder.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/utils/context_builder.py). This component serves as the central orchestrator for every AI-driven interaction—from chat to RAG to podcast generation—ensuring that language models receive precisely the right amount of relevant information without exceeding token limits.

## Core Architecture and Configuration

The `ContextBuilder` operates through three primary data structures defined in [`open_notebook/utils/context_builder.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/utils/context_builder.py):

- **`ContextBuilder`**: The main orchestrator class that accepts parameters like `source_id`, `notebook_id`, `include_insights`, and `max_tokens`
- **`ContextConfig`**: A configuration object defining `priority_weights` (source = 100, note = 50, insight = 75) and inclusion flags
- **`ContextItem`**: Individual units of context representing sources, notes, or insights with metadata like `priority` and `token_count`

During initialization (lines 65‑99), the builder stores supplied keyword arguments in `self.params` and creates a default `ContextConfig` if none is provided, automatically setting `include_insights=True` and establishing the priority hierarchy.

## The Context Construction Pipeline

The `await builder.build()` method (lines 105‑138) triggers a deterministic pipeline that transforms raw domain objects into a structured context payload.

### Source Context Aggregation

When a `source_id` is provided, `_add_source_context` (lines 142‑202) retrieves the `Source` record via `Source.get` and evaluates the `inclusion_level` parameter. The builder supports three inclusion modes:

- **insights**: Pulls only AI-generated insights
- **full content**: Retrieves the complete source material
- **not in**: Excludes the source

The source itself is encapsulated as a `ContextItem` with type `source`. If `include_insights` is enabled, each associated `Insight` object is also converted to a `ContextItem` with type `insight` and assigned a priority weight of 75.

### Notebook Context Expansion

For `notebook_id` inputs, `_add_notebook_context` (lines 210‑248) loads the `Notebook` domain object and iterates over its relationships. The method checks `ContextConfig.sources` to determine which sources to include (defaulting to all if unconfigured), delegating each to `_add_source_context`. Similarly, it processes notes according to `ContextConfig.notes` specifications.

### Note Context Integration

Individual notes are handled by `_add_note_context` (lines 254‑288), which retrieves `Note` objects via `Note.get`. Like sources, notes support **short** and **long** content variants based on configuration, and are tagged as `ContextItem` type `note` with a default priority of 50.

### Custom Parameter Processing

The builder supports extensibility through `_process_custom_params` (lines 296‑304). Any keyword argument prefixed with `custom_` is logged for future extension, allowing developers to inject additional metadata without modifying core logic.

### Post-Processing and Token Management

After population, the builder executes a four-stage post-processing chain:

**Deduplication** (`remove_duplicates`, lines 351‑363): Scans `self.items` to eliminate duplicate IDs, ensuring no source, note, or insight appears twice in the final payload.

**Prioritization** (`prioritize`, lines 315‑318): Sorts items by their `priority` field in descending order, ensuring high-value content (sources at priority 100) appears before lower-priority items.

**Token Budget Enforcement** (`truncate_to_fit`, lines 320‑350): If `max_tokens` is specified, the builder calculates cumulative token counts using `token_utils.token_count` and removes lowest-priority items until the total falls within budget. This guarantees LLM payloads never exceed model context windows.

**Response Formatting** (`_format_response`, lines 367‑416): Groups items by type (`sources`, `notes`, `insights`), calculates aggregate statistics, and returns a dictionary containing the structured content, token totals, and metadata like `notebook_id`.

## Usage Patterns and Code Examples

The repository provides three convenience functions that wrap the `ContextBuilder` for common scenarios:

```python

# Build context for an entire notebook with token limit

await build_notebook_context(
    notebook_id="notebook:12345", 
    max_tokens=4096
)

```

```python

# Build context for a single source including AI insights

await build_source_context(
    source_id="source:abcde", 
    include_insights=True
)

```

```python

# Build mixed context from explicit lists

await build_mixed_context(
    source_ids=["source:1", "source:2"],
    note_ids=["note:5"],
    notebook_id=None,
    max_tokens=2048,
)

```

These helpers are implemented at lines 22‑41 (`build_notebook_context`), lines 44‑60 (`build_source_context`), and lines 64‑95 (`build_mixed_context`).

## Integration with the Open Notebook Ecosystem

The `ContextBuilder` sits at the center of the application's AI layer, consuming domain models from [`open_notebook/domain/notebook.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/notebook.py), [`open_notebook/domain/source.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/source.py), and [`open_notebook/domain/note.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/domain/note.py). The API layer in [`api/context_service.py`](https://github.com/lfnovo/open-notebook/blob/main/api/context_service.py) and [`api/routers/context.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/context.py) exposes HTTP endpoints that instantiate the builder via these convenience functions, returning the formatted payload to frontend applications.

This architecture creates a clean separation between data retrieval (domain models), context optimization (the builder), and delivery (API layer), making it straightforward to extend context sources or modify budget constraints without touching downstream LLM implementations.

## Summary

- The `ContextBuilder` in [`open_notebook/utils/context_builder.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/utils/context_builder.py) provides a generic, token-aware pipeline for assembling AI conversation context.
- It aggregates **sources** (priority 100), **insights** (priority 75), and **notes** (priority 50) through domain model methods like `Source.get` and `Notebook.get`.
- The pipeline includes mandatory post-processing steps: **deduplication**, **priority sorting**, and **token-budget truncation** to ensure LLM compatibility.
- Convenience functions (`build_notebook_context`, `build_source_context`, `build_mixed_context`) simplify common usage patterns.
- The builder supports extensibility via `custom_` parameters and configurable inclusion levels for content granularity.

## Frequently Asked Questions

### How does the ContextBuilder enforce token limits?

The builder uses the `truncate_to_fit` method (lines 320‑350) to enforce token budgets. It calculates the cumulative token count of all collected items using `token_utils.token_count`, then iteratively removes the lowest-priority items until the total falls at or below the specified `max_tokens` threshold. This ensures the LLM never receives an oversized payload while preserving the highest-priority context.

### What is the difference between short and long content in source context?

In `_add_source_context` (lines 142‑202), the builder examines the `inclusion_level` parameter to determine content granularity. Short content typically includes summaries or excerpts suitable for quick context, while long content provides the full source material. The specific implementation depends on the `Source` domain model's `get_context` method, which the builder calls with the appropriate size parameter.

### How does the ContextBuilder handle duplicate items across different sources?

The `remove_duplicates` method (lines 351‑363) performs a uniqueness check on all collected `ContextItem` objects by their IDs. If the same source, note, or insight is referenced multiple times (for example, through both notebook and explicit source IDs), only the first occurrence is retained, ensuring the final context contains no redundant entries.

### Can I extend the ContextBuilder with custom logic?

Yes, the builder includes an extension hook in `_process_custom_params` (lines 296‑304). Any keyword argument passed to the constructor with a `custom_` prefix is captured and logged for future processing. This allows developers to inject additional metadata or specialized handling without modifying the core `ContextBuilder` logic or the convenience function signatures.