how-to-guide

How to Use the OpenRAG Public API v1 for Semantic Search

March 13, 2026 langflow-ai/openrag ↗

The OpenRAG Public API v1 exposes a single POST endpoint at /api/v1/search that automatically detects embedding models in your OpenSearch index, generates parallel query embeddings, and executes a hybrid semantic search combining KNN vector similarity with keyword matching.

The OpenRAG Public API v1 streamlines semantic document retrieval by encapsulating complex vector search logic behind a simple HTTP interface. In the langflow-ai/openrag repository, the endpoint implemented in src/api/v1/search.py orchestrates model detection, hybrid query construction, and result formatting through the SearchService.search_tool method. Clients authenticate via API key and receive structured JSON responses containing matching text chunks and aggregation facets for filterable UIs.

API Endpoint Overview

The Public API v1 provides one primary endpoint for semantic search over ingested documents. The implementation uses FastAPI for request validation and delegates search logic to the SearchService class.

The endpoint accepts a JSON payload defining the search query, optional filters, result limits, and score thresholds. Internally, the flow routes through SearchService.search_tool in src/services/search_service.py, which handles four distinct phases: embedding model detection, query vectorization, hybrid query construction, and OpenSearch execution. The service aggregates results from the embedding_model field to discover which models exist in the corpus, then generates embeddings for each detected model using the patched LiteLLM client (clients.patched_embedding_client.embeddings.create).

Request Structure and Authentication

Requests to the OpenRAG Public API v1 require Bearer token authentication using an API key. The SearchV1Body Pydantic model (defined at lines 19‑24 in src/api/v1/search.py) validates the incoming payload structure.

The model enforces the following fields:

query (string): The natural language search text (trimmed and validated for non‑emptiness at lines 31‑35)
filters (optional dict): Constraints for data_sources, document_types, owners, or connector_types
limit (integer): Maximum number of chunks to return
score_threshold (float): Minimum relevance score for inclusion
embedding_model (optional string): Override to force a specific embedding model

Authentication resolves via the get_api_key_user_async dependency, which maps the API key to a User object containing user_id. If authentication fails, the endpoint returns 401 before reaching the search logic (line 85 in src/services/search_service.py).

Core Search Architecture

The SearchService.search_tool method (starting at line 20 in src/services/search_service.py) implements the hybrid semantic search pipeline. The architecture automatically adapts to multi‑model indexes while optimizing for both vector similarity and keyword recall.

Embedding Model Detection

Before generating embeddings, the service detects which models are actually present in the indexed documents. It runs an aggregation query (agg_query at lines 101‑112) on the embedding_model field to build the available_models list.

If the index contains no documents, the service falls back to the default model configured in EMBED_MODEL (lines 129‑132 in src/services/search_service.py). This detection mechanism ensures that queries utilize every relevant vector space available in the corpus without requiring manual model specification.

Parallel Query Embedding

For each model in available_models, the service creates an async task via embed_with_model (lines 146‑205). The process normalizes model names into OpenSearch‑compatible field names using get_embedding_field_name from src/utils/embedding_fields.py (lines 49‑66), automatically handling provider prefixes like ollama/ or watsonx/.

The embeddings are generated in parallel through clients.patched_embedding_client.embeddings.create, producing query vectors for each model’s corresponding index field (chunk_embedding_<model>). This multi‑model approach allows the search to match documents embedded with different models than the query default.

Hybrid Query Construction

When the query is not a wildcard "*" (line 61), the service constructs a hybrid OpenSearch query combining semantic and lexical signals. The query structure (lines 68‑125) implements:

KNN clauses (knn_queries) for each detected model’s embedding field, weighted at 70% through a dis_max query (line 111)
Keyword fallback via multi_match on text and filename fields, weighted at 30% (line 119)
Existence validation using exists_any_embedding to ensure documents have at least one embedding field (lines 85‑91)
Filter composition combining user filters with the existence clause under a bool/filter structure (lines 94‑100)

This hybrid approach ensures high recall for semantic concepts while maintaining precision for specific keyword matches.

Execution and Error Handling

The constructed search_body is sent to OpenSearch via opensearch_client.search. The implementation includes automatic fallback logic: if the cluster does not support the num_candidates parameter (common in older OpenSearch versions), the service retries without that parameter (lines 104‑124).

Results are transformed into lightweight dictionaries containing filename, text, score, embedding_model, and other metadata (lines 40‑60). The response also includes aggregation facets for data_sources, document_types, and other fields to support faceted search UIs.

Code Examples

Raw HTTP Request with cURL

Execute semantic search directly against the OpenRAG Public API v1 using standard HTTP tools:

curl -X POST "https://<your-openrag-host>/api/v1/search" \
  -H "Authorization: Bearer <YOUR_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
        "query": "What is the pricing model for LangFlow?",
        "filters": {
          "data_sources": ["knowledge-base.pdf"],
          "document_types": ["application/pdf"]
        },
        "limit": 5,
        "score_threshold": 0.2
      }'

The response returns a JSON object containing results (an array of matching chunks with filename, text, score, and embedding_model) and aggregations for populating filter interfaces.

Python SDK Implementation

The Python SDK in sdks/python/openrag_sdk/search.py provides a typed wrapper around the HTTP API:

from openrag_sdk import OpenRAGClient

client = OpenRAGClient(
    base_url="https://<your-openrag-host>",
    api_key="<YOUR_API_KEY>"
)

# Execute semantic search

response = await client.search.query(
    query="Explain the OpenRAG ingestion pipeline",
    limit=8,
    score_threshold=0.1,
    filters={"document_types": ["text/markdown"]},
)

for hit in response.results:
    print(f"[{hit.score:.2f}] {hit.filename} → {hit.text[:120]}...")

The SDK serializes the request body, sends the authenticated POST request, and parses the response into SearchResult objects (lines 41‑65 in the SDK file), handling HTTP errors and JSON parsing automatically.

Advanced: Overriding Embedding Models

Force the search to use a specific embedding model when you know the index contains documents embedded with that model:

response = await client.search.query(
    query="How does the vector store work?",
    filters=None,
    limit=10,
    embedding_model="nomic-embed-text:latest"
)

When embedding_model is specified, the service guarantees inclusion of that model in the available_models list while still detecting and querying other models present in the index. This override is useful for testing specific embedding spaces or ensuring consistency with custom ingestion pipelines.

Summary

The OpenRAG Public API v1 simplifies semantic search through intelligent automation:

Automatic model detection scans the embedding_model field to discover which vector spaces exist in the index
Parallel embedding generation creates query vectors for every detected model using the patched LiteLLM client
Hybrid search logic combines KNN vector search (70% weight) with multi_match keyword search (30% weight)
Flexible filtering supports constraints on data sources, document types, owners, and connector types
SDK support includes a Python wrapper that handles authentication, request serialization, and response parsing

All search logic is implemented in src/services/search_service.py, with FastAPI routing in src/api/v1/search.py and field name normalization utilities in src/utils/embedding_fields.py.

Frequently Asked Questions

How does the OpenRAG Public API v1 handle authentication?

The endpoint authenticates requests using an API key passed in the Authorization header as a Bearer token. The get_api_key_user_async dependency resolves this key to a User object containing user_id, which enables document‑level access control without requiring JWT tokens or OAuth flows.

What embedding models does the semantic search support?

The API automatically supports any embedding model present in the OpenSearch index. The SearchService runs an aggregation query on the embedding_model field to detect available models, then generates query embeddings for each using the patched LiteLLM client. You can override this behavior by specifying a single model in the embedding_model request field.

Why does the search use a hybrid query instead of pure vector search?

The implementation in src/services/search_service.py constructs a hybrid query that weights KNN vector search at 70% and multi_match keyword search on text and filename fields at 30%. This hybrid approach improves recall for semantic concepts while maintaining precision for specific terminology, filenames, or exact phrases that might not align with vector embeddings.

How does the API handle OpenSearch clusters without `num_candidates` support?

If the initial query execution fails because the OpenSearch cluster does not support the num_candidates parameter (used for approximate KNN tuning), the SearchService automatically catches the error and retries the query without that parameter (lines 104‑124 in src/services/search_service.py). This fallback ensures compatibility across different OpenSearch versions without requiring client‑side configuration.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how langflow-ai/openrag works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →