How to Set Up OpenSearch Connection Details for OpenRAG: Environment Variables and Client Configuration
OpenRAG configures OpenSearch connections exclusively through environment variables defined in src/config/settings.py, which initialize an AsyncOpenSearch client with HTTPS and basic authentication during the AppClients singleton startup sequence.
OpenRAG, the open-source RAG framework maintained at langflow-ai/openrag, uses OpenSearch as its primary vector store backend. Setting up OpenSearch connection details requires configuring specific environment variables that the application reads at startup to establish secure, asynchronous connections. This guide walks through the exact configuration files, required variables, and initialization flow based on the current source code implementation.
Required Environment Variables
OpenRAG centralizes all OpenSearch configuration in src/config/settings.py. The application expects six key environment variables, with four controlling basic connectivity:
OPENSEARCH_HOST: Hostname of the OpenSearch node. Defaults tolocalhostif not set (line 22).OPENSEARCH_PORT: TCP port as an integer. Defaults to9200(line 23, parsed viaget_env_int).OPENSEARCH_USERNAME: Basic authentication username. Defaults toadmin(line 24).OPENSEARCH_PASSWORD: Basic authentication password. Must be explicitly set with no default for security reasons (line 25).OPENSEARCH_INDEX_NAME: Name of the vector index OpenRAG creates and queries. Referenced insrc/config/config_manager.py(lines 253-254) and defined insettings.py(lines 90-92).OPENSEARCH_DATA_PATH: Filesystem path for local OpenSearch data storage when using the bundled Docker development setup (line 196).
The core connection parameters are loaded at module initialization in src/config/settings.py:
# src/config/settings.py (lines 22-27)
OPENSEARCH_HOST = os.getenv("OPENSEARCH_HOST", "localhost")
OPENSEARCH_PORT = get_env_int("OPENSEARCH_PORT", 9200)
OPENSEARCH_USERNAME = os.getenv("OPENSEARCH_USERNAME", "admin")
OPENSEARCH_PASSWORD = os.getenv("OPENSEARCH_PASSWORD")
Client Initialization Flow
The AppClients singleton constructs the OpenSearch client during its initialize coroutine. This process consumes the environment variables defined above and configures an AsyncOpenSearch instance with HTTPS and compression enabled.
The client creation logic in src/config/settings.py implements the following pattern:
# src/config/settings.py – client initialization (lines 11-20 of initialize method)
self.opensearch = AsyncOpenSearch(
hosts=[{"host": OPENSEARCH_HOST, "port": OPENSEARCH_PORT}],
connection_class=AIOHttpConnection,
scheme="https",
use_ssl=True,
verify_certs=False,
ssl_assert_fingerprint=None,
http_auth=(OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD),
http_compress=True,
)
This configuration uses the AIOHttpConnection class for asynchronous I/O, forces HTTPS with SSL verification disabled (suitable for development clusters with self-signed certificates), and enables HTTP compression to reduce vector payload transfer sizes.
Health Verification and Retry Logic
Before the application accepts traffic, src/utils/opensearch_utils.py implements an exponential backoff strategy to verify cluster health. The wait_for_opensearch function polls the cluster status until it reports green or yellow health states.
# src/utils/opensearch_utils.py (lines 11-30)
async def wait_for_opensearch(opensearch_client, max_retries=15, base_delay=2.0, max_delay=30.0):
...
if await opensearch_client.ping():
health = await opensearch_client.cluster.health()
if health.get("status") in ["green", "yellow"]:
return
The function retries failed connections up to 15 times, starting with a 2-second delay and capping at 30 seconds. If the cluster fails to reach a healthy state within these constraints, the application startup sequence halts, preventing operations against an unready vector store.
Configuration Examples
Minimal .env File Configuration
Place a .env file in the repository root or mount it into your container. The load_dotenv invocation in settings.py automatically loads these values at startup:
# .env
OPENSEARCH_HOST=opensearch.mycompany.com
OPENSEARCH_PORT=9200
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=SuperSecretPass123
OPENSEARCH_INDEX_NAME=openrag-index
OPENSEARCH_DATA_PATH=/var/lib/opensearch/data
Programmatic Client Creation
For custom scripts or external utilities that need to interact with the same OpenSearch cluster, replicate the client's initialization logic:
import os
from opensearchpy import AsyncOpenSearch
from opensearchpy._async.http_aiohttp import AIOHttpConnection
async def create_opensearch_client():
client = AsyncOpenSearch(
hosts=[{
"host": os.getenv("OPENSEARCH_HOST", "localhost"),
"port": int(os.getenv("OPENSEARCH_PORT", "9200"))
}],
connection_class=AIOHttpConnection,
scheme="https",
use_ssl=True,
verify_certs=False,
http_auth=(
os.getenv("OPENSEARCH_USERNAME", "admin"),
os.getenv("OPENSEARCH_PASSWORD") # Must be set explicitly
),
http_compress=True,
)
return client
Waiting for Cluster Readiness
Mirror the built-in health check when writing standalone data migration or maintenance scripts:
from utils.opensearch_utils import wait_for_opensearch
async def init_opensearch():
client = await create_opensearch_client()
await wait_for_opensearch(client) # Retries with exponential back-off
return client
Runtime Index Name Overrides
Override the vector index name for specific workflows without modifying the global configuration:
import os
from config.settings import OPENSEARCH_INDEX_NAME
# Change the index name for a one-off run
os.environ["OPENSEARCH_INDEX_NAME"] = "my-custom-index"
# Subsequent calls to get_index_name() return the new value
Deployment-Specific Configuration
Kubernetes and Helm Deployments
For production Kubernetes environments, the Helm chart at kubernetes/helm/openrag/templates/secrets/opensearch-secret.yaml manages connection secrets. Configure these values in your Helm values file or through sealed secrets rather than plain environment variables.
Text User Interface (TUI) Configuration
The interactive configuration surface in src/tui/config_fields.py maps each OpenSearch environment variable to a form field. This allows administrators to set connection details through the TUI rather than editing configuration files directly.
Key Integration Files
Understanding these source files helps when debugging connection issues:
src/config/settings.py: Central definition of all OpenSearch environment variables and theAsyncOpenSearchclient factory.src/utils/opensearch_utils.py: Health check utilities and cluster readiness polling.src/tui/config_fields.py: Interactive configuration mapping for the TUI.flows/components/opensearch_multimodel.py: Flow component that consumes the initialized client to execute vector queries.
Summary
- OpenRAG reads OpenSearch connection details exclusively from environment variables defined in
src/config/settings.py, withOPENSEARCH_PASSWORDbeing the only required variable that lacks a default value. - The
AppClientssingleton initializes anAsyncOpenSearchclient usingAIOHttpConnectionwith HTTPS and basic authentication during application startup. - Cluster health verification occurs through
wait_for_opensearchinsrc/utils/opensearch_utils.py, implementing exponential backoff until the cluster reachesgreenoryellowstatus. - Configuration supports both file-based
.envloading and container orchestration via Kubernetes secrets defined in the Helm templates.
Frequently Asked Questions
What happens if OPENSEARCH_PASSWORD is not set?
The application will fail to initialize the OpenSearch client because OPENSEARCH_PASSWORD defaults to None and is passed directly to the http_auth tuple. According to the source code in src/config/settings.py (line 25), this variable has no default value for security reasons, causing the AsyncOpenSearch constructor to raise an authentication error during the AppClients.initialize sequence.
Can I use HTTP instead of HTTPS for local development?
The current implementation in src/config/settings.py hardcodes scheme="https" and use_ssl=True during client initialization. To use HTTP, you would need to modify the source code where self.opensearch is instantiated (lines 11-20 of the initialize method), though this is not recommended as it requires maintaining a fork of the repository.
Where does OpenRAG store vector embeddings?
Vector embeddings are stored in the index specified by OPENSEARCH_INDEX_NAME, which defaults to openrag-index if not configured. This index is created automatically during initialization and is referenced by the flow components in flows/components/opensearch_multimodel.py when executing similarity searches.
How do I clear the OpenSearch data for a fresh start?
Use the utility script at scripts/clear_opensearch_data.py, which references OPENSEARCH_DATA_PATH to locate and wipe the local data directory. This script is designed specifically for development environments running the bundled Docker OpenSearch instance, not for production clusters.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →