How ModelManager Selects AI Models with Large Context Windows (>105K Tokens) in Open Notebook

Open Notebook automatically detects requests exceeding 105,000 tokens and routes them to a configured large-context AI model by checking the token count in provision_langchain_model before querying ModelManager for the specialized default.

In the lfnovo/open-notebook repository, the system handles oversized context windows through a token-aware provisioning pipeline that dynamically selects appropriate models. When ModelManager selects AI models for large-context requests, it relies on a fixed threshold defined in the provisioning layer rather than hard-coded model names, allowing administrators to configure any compatible model without modifying application code.

Token Detection: The 105,000 Token Threshold

The detection logic begins in open_notebook/ai/provision.py, where the provision_langchain_model function measures incoming content before selecting an AI model. The system uses the token_count utility from open_notebook/utils/token_utils.py to quantify the request size.

When the token count exceeds 105,000, the function immediately switches to the large-context selection path:


# open_notebook/ai/provision.py (lines 19-30)

tokens = token_count(content)          # count tokens in the request

if tokens > 105_000:                   # > 105k tokens → large-context path

    selection_reason = f"large_context (content has {tokens} tokens)"
    model = await model_manager.get_default_model("large_context", **kwargs)
elif model_id:                         # explicit model requested by the user

    ...
else:                                  # fall back to the normal default for the type

    ...

This threshold is fixed at 105,000 tokens and serves as the primary trigger for large-context model selection.

ModelManager Selection Logic

When the token threshold is breached, ModelManager retrieves the appropriate model ID from the database configuration. The selection logic resides in open_notebook/ai/models.py, specifically within the get_default_model method.

The DefaultModels dataclass stores the large-context model identifier:


# open_notebook/ai/models.py (line 66)

large_context_model: Optional[str] = None

During runtime, ModelManager handles the "large_context" model type by reading this field and constructing the corresponding Esperanto language model:


# open_notebook/ai/models.py (lines 246-250)

elif model_type == "large_context":
    model_id = defaults.large_context_model
...
return await self.get_model(model_id, **kwargs)

The actual model instantiation is delegated to get_model, which creates a cached Esperanto LLM instance ready for LangChain integration.

Error Handling and Fallback Behavior

If no large-context model has been configured in DefaultModels, the system logs a warning and ultimately raises a ConfigurationError. This prevents the application from attempting to process oversized content with models that lack sufficient context windows. The error originates in provision_langchain_model when it cannot retrieve a valid model for the large-context type.

Configuring the Large-Context Model

Administrators define the large-context model through the database-backed DefaultModels configuration. Unlike hard-coded selections, this approach allows any model supporting extended context windows to serve as the handler without code changes.

You can configure the large-context default via the REST API exposed in api/routers/models.py:

curl -X PATCH http://localhost:5055/models/defaults \
  -H "Content-Type: application/json" \
  -d '{"large_context_model": "model-xyz"}'

After configuration, any request exceeding 105,000 tokens automatically routes to "model-xyz".

Practical Implementation Examples

Example 1: Manual Provisioning Calls

To leverage the automatic selection in your own code, call provision_langchain_model with content and let the token detection handle model selection:

from open_notebook.ai.provision import provision_langchain_model

# Suppose we have a long article stored in `text`

model = await provision_langchain_model(
    content=text,
    model_id=None,          # no explicit model requested

    default_type="chat",   # normal fallback would be the chat default

    temperature=0.2,
)

# `model` is a LangChain-compatible LLM ready for use

Example 2: Direct Access to Large-Context Defaults

For cases where you need to verify or manually retrieve the large-context configuration:

from open_notebook.ai.models import model_manager

large_ctx = await model_manager.get_default_model("large_context")

# `large_ctx` is a LanguageModel (or None if not configured)

Summary

  • Token Threshold: Open Notebook uses a fixed 105,000-token threshold in provision_langchain_model to trigger large-context routing.
  • Dynamic Selection: ModelManager selects AI models by reading the large_context_model field from DefaultModels, avoiding hard-coded model names.
  • Configurable: Administrators set the large-context model ID via the API, allowing any high-capacity model to handle oversized requests.
  • Safe Fallbacks: The system raises ConfigurationError when large-context content encounters an unconfigured model state.

Frequently Asked Questions

What happens if no large-context model is configured?

If DefaultModels.large_context_model is None when a request exceeds 105,000 tokens, the system logs a warning and raises a ConfigurationError. This prevents processing failures that would occur if a standard-context model attempted to handle the oversized content.

Can I change the 105,000 token threshold?

According to the source code in open_notebook/ai/provision.py, the threshold is hard-coded at 105_000 tokens. To modify this limit, you would need to edit the source code in the provisioning layer, as there is no runtime configuration parameter exposed for this value.

How does ModelManager cache the selected models?

ModelManager.get_model creates and caches Esperanto language model instances internally. Once retrieved from DefaultModels.large_context_model and instantiated, the model remains available for subsequent requests, reducing initialization overhead for repeated large-context operations.

Where is the token counting utility implemented?

The token_count function resides in open_notebook/utils/token_utils.py. This utility converts raw content strings into token quantities using the appropriate tokenizer for the target model family, ensuring accurate measurements before the 105,000-token comparison occurs.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →