# How the Open-Notebook Embedding Service Handles Vectorization and Embedding Storage

> Discover how the Open-Notebook embedding service vectorizes text and stores embeddings using a three-layer pipeline with utility functions, a service facade, and SurrealDB for efficient search.

- Repository: [Luis Novo/open-notebook](https://github.com/lfnovo/open-notebook)
- Tags: how-to-guide
- Published: 2026-06-10

---

**The Open-Notebook embedding service** converts raw text into searchable vectors through a robust three-layer pipeline—utility functions for chunking and mean-pooling, a service façade for API abstraction, and a SurrealDB-backed storage layer—supporting configurable batching, retries, and async processing.

The `lfnovo/open-notebook` repository implements a complete **embedding pipeline** that transforms content from sources, notes, and insights into high-dimensional vectors. This system handles everything from text segmentation to database storage, ensuring reliable vectorization even when processing large documents or handling API failures.

## Architecture Overview

The embedding service operates across three tightly-coupled layers: the **utility layer** for core vectorization logic, the **service layer** for API abstraction, and the **storage layer** backed by SurrealDB. This separation allows the system to handle complex preprocessing like chunking and mean-pooling while maintaining a clean interface for frontend clients.

## Text Vectorization Pipeline

The vectorization process begins in [`open_notebook/utils/embedding.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/utils/embedding.py), where raw text undergoes segmentation, batch processing, and model inference before returning a unified vector representation.

### Chunking and Mean-Pooling Strategy

When content exceeds the configured `CHUNK_SIZE`, the utility splits text using `chunk_text` and processes each segment individually. The system applies a **mean-pooling algorithm** that normalizes each chunk's vector before averaging, preventing domination by longer segments. This normalized mean-pooling ensures consistent vector quality regardless of document length.

### Batch Processing and Resilience

Embeddings are generated in batches controlled by `EMBEDDING_BATCH_SIZE`, which defaults to 50 and can be overridden via the `OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE` environment variable. The implementation includes robust retry logic with `EMBEDDING_MAX_RETRIES` attempts and `EMBEDDING_RETRY_DELAY` back-off delays, ensuring transient failures do not interrupt ingestion pipelines.

### Model Resolution

The `generate_embedding` and `generate_embeddings` functions fetch the active model via `model_manager.get_embedding_model()` from [`open_notebook/ai/models.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/ai/models.py). If no embedding model is configured, the system raises a clear `ValueError` to prevent silent failures.

## Embedding Storage Mechanism

Once vectors are generated, the system persists them through a service-oriented architecture that abstracts database operations behind HTTP API endpoints.

### Service Layer Abstraction

The `EmbeddingService` class in [`api/embedding_service.py`](https://github.com/lfnovo/open-notebook/blob/main/api/embedding_service.py) acts as a thin façade, forwarding requests to the API client while logging that it "uses API for embedding operations." This design decouples the vectorization logic from storage implementation details.

### API Client Transport

The `APIClient.embed_content` method in [`api/client.py`](https://github.com/lfnovo/open-notebook/blob/main/api/client.py) constructs a POST request to `/api/embed` with `item_id`, `item_type`, and optional `async_processing` parameters. All requests respect the global `API_CLIENT_TIMEOUT` setting, preventing hung connections from blocking the ingestion queue.

### SurrealDB Persistence

The FastAPI router in [`api/routers/embedding.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/embedding.py) receives the embedding request and persists the resulting vector in SurrealDB. Vectors are stored in the `embedding` column of the corresponding entity record—whether source, note, or insight—with indexing enabled for high-performance vector search operations.

## Configuration and Environment Variables

The embedding service behavior is controlled through several environment variables and constants:

- **`OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE`** – Controls batch size for API calls (default: 50).
- **`CHUNK_SIZE`** – Determines text segmentation thresholds for long documents.
- **`EMBEDDING_MAX_RETRIES`** – Maximum retry attempts for failed embedding requests.
- **`EMBEDDING_RETRY_DELAY`** – Back-off delay between retry attempts.
- **`API_CLIENT_TIMEOUT`** – Global timeout for HTTP requests to the embedding endpoint.

## Practical Implementation Examples

### Generating Embeddings Directly

For direct vectorization without database storage, use the utility functions:

```python
from open_notebook.utils.embedding import generate_embedding

vector = await generate_embedding(
    text="The quick brown fox jumps over the lazy dog",
    content_type="plain_text",
)
print(f"Generated vector with {len(vector)} dimensions")

```

### Service-Based Embedding with Storage

To embed and persist notebook items through the service layer:

```python
from api.embedding_service import embedding_service

result = embedding_service.embed_content(
    item_id="src-123",
    item_type="source"
)
print(f"API response: {result}")

```

### Async Client Operations

For frontend applications or async contexts, use the API client:

```python
from api.client import api_client

await api_client.embed_content(
    item_id="note-456",
    item_type="note",
    async_processing=True,
)

```

## Summary

- **Open-Notebook** implements a three-layer embedding architecture: utility functions for vectorization, service façades for abstraction, and SurrealDB for persistence.
- The **vectorization pipeline** in [`open_notebook/utils/embedding.py`](https://github.com/lfnovo/open-notebook/blob/main/open_notebook/utils/embedding.py) handles text chunking, mean-pooled aggregation, batching (default 50 items), and configurable retry logic.
- **Environment variables** control batch sizing (`OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE`), chunking thresholds, and retry behavior.
- The **storage flow** routes through [`api/embedding_service.py`](https://github.com/lfnovo/open-notebook/blob/main/api/embedding_service.py) and [`api/client.py`](https://github.com/lfnovo/open-notebook/blob/main/api/client.py) before persisting vectors in SurrealDB's indexed `embedding` column for semantic search.
- The system supports both **synchronous and asynchronous** processing modes via the `async_processing` flag.

## Frequently Asked Questions

### What happens when text exceeds the configured chunk size?

When text exceeds `CHUNK_SIZE`, the `generate_embedding` function automatically segments content using `chunk_text`, embeds each segment separately, and applies mean-pooling with normalization to produce a single representative vector. This ensures consistent embedding quality regardless of document length.

### How does the embedding service handle API failures?

The utility layer implements resilient batch processing with `EMBEDDING_MAX_RETRIES` attempts and `EMBEDDING_RETRY_DELAY` back-off delays. If the embedding model is unavailable, the system raises a `ValueError` immediately, while transient network errors trigger automatic retries before surfacing failures to the caller.

### What database does Open-Notebook use for vector storage?

Open-Notebook uses **SurrealDB** for vector persistence. The FastAPI router in [`api/routers/embedding.py`](https://github.com/lfnovo/open-notebook/blob/main/api/routers/embedding.py) stores vectors in the `embedding` column of entity records (sources, notes, or insights), with the column indexed specifically for high-performance vector search operations.

### Can embedding operations be processed asynchronously?

Yes. The `api_client.embed_content` method accepts an `async_processing` parameter that delegates embedding generation to background workers. This prevents blocking the main thread when processing large documents or batch ingestion tasks.