How to Set Up OpenSearch Connection Details for OpenRAG: Environment Variables and Client Configuration

OpenRAG configures OpenSearch connections exclusively through environment variables defined in src/config/settings.py, which initialize an AsyncOpenSearch client with HTTPS and basic authentication during the AppClients singleton startup sequence.

OpenRAG, the open-source RAG framework maintained at langflow-ai/openrag, uses OpenSearch as its primary vector store backend. Setting up OpenSearch connection details requires configuring specific environment variables that the application reads at startup to establish secure, asynchronous connections. This guide walks through the exact configuration files, required variables, and initialization flow based on the current source code implementation.

Required Environment Variables

OpenRAG centralizes all OpenSearch configuration in src/config/settings.py. The application expects six key environment variables, with four controlling basic connectivity:

  • OPENSEARCH_HOST: Hostname of the OpenSearch node. Defaults to localhost if not set (line 22).
  • OPENSEARCH_PORT: TCP port as an integer. Defaults to 9200 (line 23, parsed via get_env_int).
  • OPENSEARCH_USERNAME: Basic authentication username. Defaults to admin (line 24).
  • OPENSEARCH_PASSWORD: Basic authentication password. Must be explicitly set with no default for security reasons (line 25).
  • OPENSEARCH_INDEX_NAME: Name of the vector index OpenRAG creates and queries. Referenced in src/config/config_manager.py (lines 253-254) and defined in settings.py (lines 90-92).
  • OPENSEARCH_DATA_PATH: Filesystem path for local OpenSearch data storage when using the bundled Docker development setup (line 196).

The core connection parameters are loaded at module initialization in src/config/settings.py:


# src/config/settings.py (lines 22-27)

OPENSEARCH_HOST = os.getenv("OPENSEARCH_HOST", "localhost")
OPENSEARCH_PORT = get_env_int("OPENSEARCH_PORT", 9200)
OPENSEARCH_USERNAME = os.getenv("OPENSEARCH_USERNAME", "admin")
OPENSEARCH_PASSWORD = os.getenv("OPENSEARCH_PASSWORD")

Client Initialization Flow

The AppClients singleton constructs the OpenSearch client during its initialize coroutine. This process consumes the environment variables defined above and configures an AsyncOpenSearch instance with HTTPS and compression enabled.

The client creation logic in src/config/settings.py implements the following pattern:


# src/config/settings.py – client initialization (lines 11-20 of initialize method)

self.opensearch = AsyncOpenSearch(
    hosts=[{"host": OPENSEARCH_HOST, "port": OPENSEARCH_PORT}],
    connection_class=AIOHttpConnection,
    scheme="https",
    use_ssl=True,
    verify_certs=False,
    ssl_assert_fingerprint=None,
    http_auth=(OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD),
    http_compress=True,
)

This configuration uses the AIOHttpConnection class for asynchronous I/O, forces HTTPS with SSL verification disabled (suitable for development clusters with self-signed certificates), and enables HTTP compression to reduce vector payload transfer sizes.

Health Verification and Retry Logic

Before the application accepts traffic, src/utils/opensearch_utils.py implements an exponential backoff strategy to verify cluster health. The wait_for_opensearch function polls the cluster status until it reports green or yellow health states.


# src/utils/opensearch_utils.py (lines 11-30)

async def wait_for_opensearch(opensearch_client, max_retries=15, base_delay=2.0, max_delay=30.0):
    ...
    if await opensearch_client.ping():
        health = await opensearch_client.cluster.health()
        if health.get("status") in ["green", "yellow"]:
            return

The function retries failed connections up to 15 times, starting with a 2-second delay and capping at 30 seconds. If the cluster fails to reach a healthy state within these constraints, the application startup sequence halts, preventing operations against an unready vector store.

Configuration Examples

Minimal .env File Configuration

Place a .env file in the repository root or mount it into your container. The load_dotenv invocation in settings.py automatically loads these values at startup:


# .env

OPENSEARCH_HOST=opensearch.mycompany.com
OPENSEARCH_PORT=9200
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=SuperSecretPass123
OPENSEARCH_INDEX_NAME=openrag-index
OPENSEARCH_DATA_PATH=/var/lib/opensearch/data

Programmatic Client Creation

For custom scripts or external utilities that need to interact with the same OpenSearch cluster, replicate the client's initialization logic:

import os
from opensearchpy import AsyncOpenSearch
from opensearchpy._async.http_aiohttp import AIOHttpConnection

async def create_opensearch_client():
    client = AsyncOpenSearch(
        hosts=[{
            "host": os.getenv("OPENSEARCH_HOST", "localhost"),
            "port": int(os.getenv("OPENSEARCH_PORT", "9200"))
        }],
        connection_class=AIOHttpConnection,
        scheme="https",
        use_ssl=True,
        verify_certs=False,
        http_auth=(
            os.getenv("OPENSEARCH_USERNAME", "admin"),
            os.getenv("OPENSEARCH_PASSWORD")  # Must be set explicitly

        ),
        http_compress=True,
    )
    return client

Waiting for Cluster Readiness

Mirror the built-in health check when writing standalone data migration or maintenance scripts:

from utils.opensearch_utils import wait_for_opensearch

async def init_opensearch():
    client = await create_opensearch_client()
    await wait_for_opensearch(client)  # Retries with exponential back-off

    return client

Runtime Index Name Overrides

Override the vector index name for specific workflows without modifying the global configuration:

import os
from config.settings import OPENSEARCH_INDEX_NAME

# Change the index name for a one-off run

os.environ["OPENSEARCH_INDEX_NAME"] = "my-custom-index"

# Subsequent calls to get_index_name() return the new value

Deployment-Specific Configuration

Kubernetes and Helm Deployments

For production Kubernetes environments, the Helm chart at kubernetes/helm/openrag/templates/secrets/opensearch-secret.yaml manages connection secrets. Configure these values in your Helm values file or through sealed secrets rather than plain environment variables.

Text User Interface (TUI) Configuration

The interactive configuration surface in src/tui/config_fields.py maps each OpenSearch environment variable to a form field. This allows administrators to set connection details through the TUI rather than editing configuration files directly.

Key Integration Files

Understanding these source files helps when debugging connection issues:

Summary

  • OpenRAG reads OpenSearch connection details exclusively from environment variables defined in src/config/settings.py, with OPENSEARCH_PASSWORD being the only required variable that lacks a default value.
  • The AppClients singleton initializes an AsyncOpenSearch client using AIOHttpConnection with HTTPS and basic authentication during application startup.
  • Cluster health verification occurs through wait_for_opensearch in src/utils/opensearch_utils.py, implementing exponential backoff until the cluster reaches green or yellow status.
  • Configuration supports both file-based .env loading and container orchestration via Kubernetes secrets defined in the Helm templates.

Frequently Asked Questions

What happens if OPENSEARCH_PASSWORD is not set?

The application will fail to initialize the OpenSearch client because OPENSEARCH_PASSWORD defaults to None and is passed directly to the http_auth tuple. According to the source code in src/config/settings.py (line 25), this variable has no default value for security reasons, causing the AsyncOpenSearch constructor to raise an authentication error during the AppClients.initialize sequence.

Can I use HTTP instead of HTTPS for local development?

The current implementation in src/config/settings.py hardcodes scheme="https" and use_ssl=True during client initialization. To use HTTP, you would need to modify the source code where self.opensearch is instantiated (lines 11-20 of the initialize method), though this is not recommended as it requires maintaining a fork of the repository.

Where does OpenRAG store vector embeddings?

Vector embeddings are stored in the index specified by OPENSEARCH_INDEX_NAME, which defaults to openrag-index if not configured. This index is created automatically during initialization and is referenced by the flow components in flows/components/opensearch_multimodel.py when executing similarity searches.

How do I clear the OpenSearch data for a fresh start?

Use the utility script at scripts/clear_opensearch_data.py, which references OPENSEARCH_DATA_PATH to locate and wipe the local data directory. This script is designed specifically for development environments running the bundled Docker OpenSearch instance, not for production clusters.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →