# How to Use the OpenRAG Public API v1 for Semantic Search

> Learn to use the OpenRAG Public API v1 for semantic search. Explore the POST endpoint for hybrid search combining vector similarity and keyword matching.

- Repository: [Langflow/openrag](https://github.com/langflow-ai/openrag)
- Tags: how-to-guide
- Published: 2026-03-13

---

**The OpenRAG Public API v1 exposes a single POST endpoint at `/api/v1/search` that automatically detects embedding models in your OpenSearch index, generates parallel query embeddings, and executes a hybrid semantic search combining KNN vector similarity with keyword matching.**

The OpenRAG Public API v1 streamlines semantic document retrieval by encapsulating complex vector search logic behind a simple HTTP interface. In the `langflow-ai/openrag` repository, the endpoint implemented in [`src/api/v1/search.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/search.py) orchestrates model detection, hybrid query construction, and result formatting through the `SearchService.search_tool` method. Clients authenticate via API key and receive structured JSON responses containing matching text chunks and aggregation facets for filterable UIs.

## API Endpoint Overview

The Public API v1 provides one primary endpoint for semantic search over ingested documents. The implementation uses FastAPI for request validation and delegates search logic to the `SearchService` class.

The endpoint accepts a JSON payload defining the search query, optional filters, result limits, and score thresholds. Internally, the flow routes through `SearchService.search_tool` in [`src/services/search_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/search_service.py), which handles four distinct phases: embedding model detection, query vectorization, hybrid query construction, and OpenSearch execution. The service aggregates results from the `embedding_model` field to discover which models exist in the corpus, then generates embeddings for each detected model using the patched LiteLLM client (`clients.patched_embedding_client.embeddings.create`).

## Request Structure and Authentication

Requests to the OpenRAG Public API v1 require Bearer token authentication using an API key. The `SearchV1Body` Pydantic model (defined at lines 19‑24 in [`src/api/v1/search.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/search.py)) validates the incoming payload structure.

The model enforces the following fields:
- **query** (string): The natural language search text (trimmed and validated for non‑emptiness at lines 31‑35)
- **filters** (optional dict): Constraints for `data_sources`, `document_types`, `owners`, or `connector_types`
- **limit** (integer): Maximum number of chunks to return
- **score_threshold** (float): Minimum relevance score for inclusion
- **embedding_model** (optional string): Override to force a specific embedding model

Authentication resolves via the `get_api_key_user_async` dependency, which maps the API key to a `User` object containing `user_id`. If authentication fails, the endpoint returns 401 before reaching the search logic (line 85 in [`src/services/search_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/search_service.py)).

## Core Search Architecture

The `SearchService.search_tool` method (starting at line 20 in [`src/services/search_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/search_service.py)) implements the hybrid semantic search pipeline. The architecture automatically adapts to multi‑model indexes while optimizing for both vector similarity and keyword recall.

### Embedding Model Detection

Before generating embeddings, the service detects which models are actually present in the indexed documents. It runs an aggregation query (`agg_query` at lines 101‑112) on the `embedding_model` field to build the `available_models` list.

If the index contains no documents, the service falls back to the default model configured in `EMBED_MODEL` (lines 129‑132 in [`src/services/search_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/search_service.py)). This detection mechanism ensures that queries utilize every relevant vector space available in the corpus without requiring manual model specification.

### Parallel Query Embedding

For each model in `available_models`, the service creates an async task via `embed_with_model` (lines 146‑205). The process normalizes model names into OpenSearch‑compatible field names using `get_embedding_field_name` from [`src/utils/embedding_fields.py`](https://github.com/langflow-ai/openrag/blob/main/src/utils/embedding_fields.py) (lines 49‑66), automatically handling provider prefixes like `ollama/` or `watsonx/`.

The embeddings are generated in parallel through `clients.patched_embedding_client.embeddings.create`, producing query vectors for each model’s corresponding index field (`chunk_embedding_<model>`). This multi‑model approach allows the search to match documents embedded with different models than the query default.

### Hybrid Query Construction

When the query is not a wildcard `"*"` (line 61), the service constructs a hybrid OpenSearch query combining semantic and lexical signals. The query structure (lines 68‑125) implements:

- **KNN clauses** (`knn_queries`) for each detected model’s embedding field, weighted at 70% through a `dis_max` query (line 111)
- **Keyword fallback** via `multi_match` on `text` and `filename` fields, weighted at 30% (line 119)
- **Existence validation** using `exists_any_embedding` to ensure documents have at least one embedding field (lines 85‑91)
- **Filter composition** combining user filters with the existence clause under a `bool/filter` structure (lines 94‑100)

This hybrid approach ensures high recall for semantic concepts while maintaining precision for specific keyword matches.

### Execution and Error Handling

The constructed `search_body` is sent to OpenSearch via `opensearch_client.search`. The implementation includes automatic fallback logic: if the cluster does not support the `num_candidates` parameter (common in older OpenSearch versions), the service retries without that parameter (lines 104‑124).

Results are transformed into lightweight dictionaries containing `filename`, `text`, `score`, `embedding_model`, and other metadata (lines 40‑60). The response also includes aggregation facets for `data_sources`, `document_types`, and other fields to support faceted search UIs.

## Code Examples

### Raw HTTP Request with cURL

Execute semantic search directly against the OpenRAG Public API v1 using standard HTTP tools:

```bash
curl -X POST "https://<your-openrag-host>/api/v1/search" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
        "query": "What is the pricing model for LangFlow?",
        "filters": {
          "data_sources": ["knowledge-base.pdf"],
          "document_types": ["application/pdf"]
        },
        "limit": 5,
        "score_threshold": 0.2
      }'

```

The response returns a JSON object containing `results` (an array of matching chunks with `filename`, `text`, `score`, and `embedding_model`) and `aggregations` for populating filter interfaces.

### Python SDK Implementation

The Python SDK in [`sdks/python/openrag_sdk/search.py`](https://github.com/langflow-ai/openrag/blob/main/sdks/python/openrag_sdk/search.py) provides a typed wrapper around the HTTP API:

```python
from openrag_sdk import OpenRAGClient

client = OpenRAGClient(
    base_url="https://<your-openrag-host>",
    api_key="<YOUR_API_KEY>"
)

# Execute semantic search

response = await client.search.query(
    query="Explain the OpenRAG ingestion pipeline",
    limit=8,
    score_threshold=0.1,
    filters={"document_types": ["text/markdown"]},
)

for hit in response.results:
    print(f"[{hit.score:.2f}] {hit.filename} → {hit.text[:120]}...")

```

The SDK serializes the request body, sends the authenticated POST request, and parses the response into `SearchResult` objects (lines 41‑65 in the SDK file), handling HTTP errors and JSON parsing automatically.

### Advanced: Overriding Embedding Models

Force the search to use a specific embedding model when you know the index contains documents embedded with that model:

```python
response = await client.search.query(
    query="How does the vector store work?",
    filters=None,
    limit=10,
    embedding_model="nomic-embed-text:latest"
)

```

When `embedding_model` is specified, the service guarantees inclusion of that model in the `available_models` list while still detecting and querying other models present in the index. This override is useful for testing specific embedding spaces or ensuring consistency with custom ingestion pipelines.

## Summary

The OpenRAG Public API v1 simplifies semantic search through intelligent automation:

- **Automatic model detection** scans the `embedding_model` field to discover which vector spaces exist in the index
- **Parallel embedding generation** creates query vectors for every detected model using the patched LiteLLM client
- **Hybrid search logic** combines KNN vector search (70% weight) with `multi_match` keyword search (30% weight)
- **Flexible filtering** supports constraints on data sources, document types, owners, and connector types
- **SDK support** includes a Python wrapper that handles authentication, request serialization, and response parsing

All search logic is implemented in [`src/services/search_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/search_service.py), with FastAPI routing in [`src/api/v1/search.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/search.py) and field name normalization utilities in [`src/utils/embedding_fields.py`](https://github.com/langflow-ai/openrag/blob/main/src/utils/embedding_fields.py).

## Frequently Asked Questions

### How does the OpenRAG Public API v1 handle authentication?

The endpoint authenticates requests using an API key passed in the `Authorization` header as a Bearer token. The `get_api_key_user_async` dependency resolves this key to a `User` object containing `user_id`, which enables document‑level access control without requiring JWT tokens or OAuth flows.

### What embedding models does the semantic search support?

The API automatically supports any embedding model present in the OpenSearch index. The `SearchService` runs an aggregation query on the `embedding_model` field to detect available models, then generates query embeddings for each using the patched LiteLLM client. You can override this behavior by specifying a single model in the `embedding_model` request field.

### Why does the search use a hybrid query instead of pure vector search?

The implementation in [`src/services/search_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/search_service.py) constructs a hybrid query that weights KNN vector search at 70% and `multi_match` keyword search on `text` and `filename` fields at 30%. This hybrid approach improves recall for semantic concepts while maintaining precision for specific terminology, filenames, or exact phrases that might not align with vector embeddings.

### How does the API handle OpenSearch clusters without `num_candidates` support?

If the initial query execution fails because the OpenSearch cluster does not support the `num_candidates` parameter (used for approximate KNN tuning), the `SearchService` automatically catches the error and retries the query without that parameter (lines 104‑124 in [`src/services/search_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/search_service.py)). This fallback ensures compatibility across different OpenSearch versions without requiring client‑side configuration.