How to Configure OpenRAG Using Environment Variables: A Complete Guide
OpenRAG reads all configuration from environment variables loaded via python-dotenv, allowing you to control vector stores, LLM providers, OAuth connectors, and timeouts through a single .env file placed in the project root.
The langflow-ai/openrag repository implements a strictly environment-driven configuration architecture. Every runtime parameter—from OpenSearch connection strings to LLM API keys—is resolved through environment variables parsed at startup, making the application portable across development, Docker, and production environments without code changes.
Configuration Architecture
Environment Loading Mechanism
In src/config/settings.py, the application initializes by calling load_dotenv(override=False) twice: first for the current directory and then for the repository root (lines 17-19). This ensures that a .env file placed either next to the source code or at the project root is automatically loaded into the process environment before any configuration constants are defined.
Type-Safe Variable Parsing
Rather than parsing raw strings manually, the codebase uses helper functions defined in src/utils/env_utils.py. The get_env_int() and get_env_float() functions safely cast environment values to numeric types while supplying defaults when variables are missing or malformed, preventing runtime type errors.
Configuration Manager Integration
The src/config/config_manager.py module merges YAML-based configuration from openrag.yaml (if present) with environment values. Functions like get_openrag_config() (lines 42-48 in settings.py) expose the final parsed configuration to the rest of the application, providing a single source of truth for constants used across OpenSearch clients, Langflow HTTP clients, and Docling services.
Essential Environment Variables
OpenSearch Vector Store Configuration
Control your vector database connection using these variables:
OPENSEARCH_HOST: Target hostname (default:localhost)OPENSEARCH_PORT: Service port (default:9200)OPENSEARCH_USERNAME: Authentication user (default:admin)OPENSEARCH_PASSWORD: Credentials for basic authOPENSEARCH_INDEX_NAME: Target index for document storageOPENSEARCH_DATA_PATH: Filesystem path for OpenSearch data
Langflow Integration Settings
Configure the Langflow orchestration layer:
LANGFLOW_URL: Base URL for the Langflow instance (e.g.,http://localhost:7860)LANGFLOW_CHAT_FLOW_ID: UUID for the chat processing flowLANGFLOW_INGEST_FLOW_ID: UUID for document ingestion flowsLANGFLOW_AUTO_LOGIN: Boolean flag (default:False) enabling automatic authentication with default credentialsLANGFLOW_SUPERUSERandLANGFLOW_SUPERUSER_PASSWORD: Credentials for automatic API key generation whenLANGFLOW_KEYis not explicitly provided
LLM and Embedding Provider Keys
The application refuses to start a provider without valid authentication:
OPENAI_API_KEY,ANTHROPIC_API_KEY,WATSONX_API_KEY: Provider-specific API tokensOLLAMA_ENDPOINT: Local Ollama server URLWATSONX_ENDPOINTandWATSONX_PROJECT_ID: IBM Watsonx configurationLLM_PROVIDERandEMBEDDING_PROVIDER: Selection keys (e.g.,openai,anthropic,ollama)LLM_MODELandEMBEDDING_MODEL: Specific model identifiers (e.g.,gpt-4o-mini,text-embedding-3-small)
OAuth Connectors for Cloud Storage
Enable Google Drive and Microsoft SharePoint connectors:
GOOGLE_OAUTH_CLIENT_IDandGOOGLE_OAUTH_CLIENT_SECRET: Google Drive integrationMICROSOFT_GRAPH_OAUTH_CLIENT_IDandMICROSOFT_GRAPH_OAUTH_CLIENT_SECRET: OneDrive/SharePoint access
Absence of these variables disables the respective connector entirely.
Timeouts and Performance Tuning
Adjust processing limits for large documents:
LANGFLOW_TIMEOUT: Total HTTP timeout in seconds (default:2400, i.e., 40 minutes)LANGFLOW_CONNECT_TIMEOUT: Initial connection timeout (default:30)INGESTION_TIMEOUT: Per-file processing limit (default:3600, i.e., 1 hour)UPLOAD_BATCH_SIZE: Bulk upload chunk sizeMAX_WORKERS: Concurrency level for parallel processingDOCLING_WORKERS: Parallel workers for PDF OCR processing
Feature Flags and Debug Options
Toggle functionality without code changes:
DISABLE_INGEST_WITH_LANGFLOW: Set totrueto bypass Langflow for ingestion (default:false)INGEST_SAMPLE_DATA: Seed the database with sample documents on startup (default:true)WEBHOOK_BASE_URL: Enable continuous ingestion callbacks (disabled if unset)LOG_LEVEL: Verbosity for application logging (e.g.,INFO,DEBUG)SERVICE_NAME: Application identifier in logs (default:openrag)NO_COLOR: Disable colored terminal outputACCESS_LOG: Toggle HTTP request logging
Practical Configuration Example
Create a .env file in the project root (alongside src/ or at the repository base). Below is a production-ready template demonstrating all major configuration categories:
# Core services ---------------------------------------------------------
OPENSEARCH_HOST=opensearch
OPENSEARCH_PORT=9200
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=MyStrong!Passw0rd
OPENSEARCH_INDEX_NAME=documents
# Langflow -------------------------------------------------------------
LANGFLOW_URL=http://localhost:7860
LANGFLOW_CHAT_FLOW_ID=1098eea1-6649-4e1d-aed1-b77249fb8dd0
LANGFLOW_INGEST_FLOW_ID=5488df7c-b93f-4f87-a446-b67028bc0813
LANGFLOW_AUTO_LOGIN=True
LANGFLOW_SUPERUSER=admin
LANGFLOW_SUPERUSER_PASSWORD=admin
# OAuth connectors -------------------------------------------------------
GOOGLE_OAUTH_CLIENT_ID=YOUR_GOOGLE_CLIENT_ID
GOOGLE_OAUTH_CLIENT_SECRET=YOUR_GOOGLE_CLIENT_SECRET
MICROSOFT_GRAPH_OAUTH_CLIENT_ID=YOUR_MS_CLIENT_ID
MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET=YOUR_MS_CLIENT_SECRET
# Provider configuration -------------------------------------------------
OPENAI_API_KEY=sk-...
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
# Timeouts --------------------------------------------------------------
LANGFLOW_TIMEOUT=2400
LANGFLOW_CONNECT_TIMEOUT=30
INGESTION_TIMEOUT=3600
MAX_WORKERS=4
# Optional features ------------------------------------------------------
DISABLE_INGEST_WITH_LANGFLOW=false
INGEST_SAMPLE_DATA=true
WEBHOOK_BASE_URL=https://my-ngrok.io/webhook
LOG_LEVEL=INFO
When running docker compose up or executing make run, OpenRAG automatically ingests these values through the settings.py initialization sequence.
Verifying Your Configuration at Runtime
Inspect active environment variables through the Terminal User Interface (TUI). The src/tui/managers/env_manager.py module provides a runtime view of parsed configuration, confirming that variables from your .env file were correctly loaded and applied according to the logic in src/config/settings.py.
Summary
- OpenRAG configuration is strictly environment-driven via variables defined in
src/config/settings.py(lines 22-78) - The application loads
.envfiles automatically usingpython-dotenvwith fallback to repository root (lines 17-19) - Type-safe parsing occurs through
src/utils/env_utils.pyhelpers (get_env_int,get_env_float) src/config/config_manager.pymerges YAML files with environment values for hybrid configuration- All sensitive credentials, LLM providers, and vector store connections are controlled through environment variables with no hardcoded defaults for security-critical settings
- Reference
.env.examplein the repository root for the canonical list of supported variables
Frequently Asked Questions
Does OpenRAG support file-based configuration instead of environment variables?
Yes. While environment variables are the primary mechanism, src/config/config_manager.py loads an optional openrag.yaml file and merges it with environment values. Environment variables take precedence over YAML settings, allowing you to mix both approaches when you configure OpenRAG using environment variables as the override layer.
What happens if I omit required API keys like OPENAI_API_KEY?
The application will refuse to initialize the respective provider. In src/config/settings.py, the constants are imported directly by client factories in src/main.py; missing required keys cause the provider instantiation to fail gracefully with a clear error message rather than starting with invalid credentials.
How do I change configuration without restarting the container?
You cannot change configuration without a restart. OpenRAG reads all environment variables at startup in settings.py (lines 17-19) and stores them as module-level constants. Changes to the .env file require a container restart or process reload to take effect, as the values are not re-parsed at runtime.
Where can I find the complete list of supported environment variables?
The .env.example file in the repository root contains the canonical documentation of every supported variable, its purpose, and suggested defaults. This file serves as the authoritative reference for the entire configuration surface area implemented in src/config/settings.py.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →