How to Configure OpenRAG Using Environment Variables: A Complete Guide

OpenRAG reads all configuration from environment variables loaded via python-dotenv, allowing you to control vector stores, LLM providers, OAuth connectors, and timeouts through a single .env file placed in the project root.

The langflow-ai/openrag repository implements a strictly environment-driven configuration architecture. Every runtime parameter—from OpenSearch connection strings to LLM API keys—is resolved through environment variables parsed at startup, making the application portable across development, Docker, and production environments without code changes.

Configuration Architecture

Environment Loading Mechanism

In src/config/settings.py, the application initializes by calling load_dotenv(override=False) twice: first for the current directory and then for the repository root (lines 17-19). This ensures that a .env file placed either next to the source code or at the project root is automatically loaded into the process environment before any configuration constants are defined.

Type-Safe Variable Parsing

Rather than parsing raw strings manually, the codebase uses helper functions defined in src/utils/env_utils.py. The get_env_int() and get_env_float() functions safely cast environment values to numeric types while supplying defaults when variables are missing or malformed, preventing runtime type errors.

Configuration Manager Integration

The src/config/config_manager.py module merges YAML-based configuration from openrag.yaml (if present) with environment values. Functions like get_openrag_config() (lines 42-48 in settings.py) expose the final parsed configuration to the rest of the application, providing a single source of truth for constants used across OpenSearch clients, Langflow HTTP clients, and Docling services.

Essential Environment Variables

OpenSearch Vector Store Configuration

Control your vector database connection using these variables:

  • OPENSEARCH_HOST: Target hostname (default: localhost)
  • OPENSEARCH_PORT: Service port (default: 9200)
  • OPENSEARCH_USERNAME: Authentication user (default: admin)
  • OPENSEARCH_PASSWORD: Credentials for basic auth
  • OPENSEARCH_INDEX_NAME: Target index for document storage
  • OPENSEARCH_DATA_PATH: Filesystem path for OpenSearch data

Langflow Integration Settings

Configure the Langflow orchestration layer:

  • LANGFLOW_URL: Base URL for the Langflow instance (e.g., http://localhost:7860)
  • LANGFLOW_CHAT_FLOW_ID: UUID for the chat processing flow
  • LANGFLOW_INGEST_FLOW_ID: UUID for document ingestion flows
  • LANGFLOW_AUTO_LOGIN: Boolean flag (default: False) enabling automatic authentication with default credentials
  • LANGFLOW_SUPERUSER and LANGFLOW_SUPERUSER_PASSWORD: Credentials for automatic API key generation when LANGFLOW_KEY is not explicitly provided

LLM and Embedding Provider Keys

The application refuses to start a provider without valid authentication:

  • OPENAI_API_KEY, ANTHROPIC_API_KEY, WATSONX_API_KEY: Provider-specific API tokens
  • OLLAMA_ENDPOINT: Local Ollama server URL
  • WATSONX_ENDPOINT and WATSONX_PROJECT_ID: IBM Watsonx configuration
  • LLM_PROVIDER and EMBEDDING_PROVIDER: Selection keys (e.g., openai, anthropic, ollama)
  • LLM_MODEL and EMBEDDING_MODEL: Specific model identifiers (e.g., gpt-4o-mini, text-embedding-3-small)

OAuth Connectors for Cloud Storage

Enable Google Drive and Microsoft SharePoint connectors:

  • GOOGLE_OAUTH_CLIENT_ID and GOOGLE_OAUTH_CLIENT_SECRET: Google Drive integration
  • MICROSOFT_GRAPH_OAUTH_CLIENT_ID and MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET: OneDrive/SharePoint access

Absence of these variables disables the respective connector entirely.

Timeouts and Performance Tuning

Adjust processing limits for large documents:

  • LANGFLOW_TIMEOUT: Total HTTP timeout in seconds (default: 2400, i.e., 40 minutes)
  • LANGFLOW_CONNECT_TIMEOUT: Initial connection timeout (default: 30)
  • INGESTION_TIMEOUT: Per-file processing limit (default: 3600, i.e., 1 hour)
  • UPLOAD_BATCH_SIZE: Bulk upload chunk size
  • MAX_WORKERS: Concurrency level for parallel processing
  • DOCLING_WORKERS: Parallel workers for PDF OCR processing

Feature Flags and Debug Options

Toggle functionality without code changes:

  • DISABLE_INGEST_WITH_LANGFLOW: Set to true to bypass Langflow for ingestion (default: false)
  • INGEST_SAMPLE_DATA: Seed the database with sample documents on startup (default: true)
  • WEBHOOK_BASE_URL: Enable continuous ingestion callbacks (disabled if unset)
  • LOG_LEVEL: Verbosity for application logging (e.g., INFO, DEBUG)
  • SERVICE_NAME: Application identifier in logs (default: openrag)
  • NO_COLOR: Disable colored terminal output
  • ACCESS_LOG: Toggle HTTP request logging

Practical Configuration Example

Create a .env file in the project root (alongside src/ or at the repository base). Below is a production-ready template demonstrating all major configuration categories:


# Core services ---------------------------------------------------------

OPENSEARCH_HOST=opensearch
OPENSEARCH_PORT=9200
OPENSEARCH_USERNAME=admin
OPENSEARCH_PASSWORD=MyStrong!Passw0rd
OPENSEARCH_INDEX_NAME=documents

# Langflow -------------------------------------------------------------

LANGFLOW_URL=http://localhost:7860
LANGFLOW_CHAT_FLOW_ID=1098eea1-6649-4e1d-aed1-b77249fb8dd0
LANGFLOW_INGEST_FLOW_ID=5488df7c-b93f-4f87-a446-b67028bc0813
LANGFLOW_AUTO_LOGIN=True
LANGFLOW_SUPERUSER=admin
LANGFLOW_SUPERUSER_PASSWORD=admin

# OAuth connectors -------------------------------------------------------

GOOGLE_OAUTH_CLIENT_ID=YOUR_GOOGLE_CLIENT_ID
GOOGLE_OAUTH_CLIENT_SECRET=YOUR_GOOGLE_CLIENT_SECRET
MICROSOFT_GRAPH_OAUTH_CLIENT_ID=YOUR_MS_CLIENT_ID
MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET=YOUR_MS_CLIENT_SECRET

# Provider configuration -------------------------------------------------

OPENAI_API_KEY=sk-...
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small

# Timeouts --------------------------------------------------------------

LANGFLOW_TIMEOUT=2400
LANGFLOW_CONNECT_TIMEOUT=30
INGESTION_TIMEOUT=3600
MAX_WORKERS=4

# Optional features ------------------------------------------------------

DISABLE_INGEST_WITH_LANGFLOW=false
INGEST_SAMPLE_DATA=true
WEBHOOK_BASE_URL=https://my-ngrok.io/webhook
LOG_LEVEL=INFO

When running docker compose up or executing make run, OpenRAG automatically ingests these values through the settings.py initialization sequence.

Verifying Your Configuration at Runtime

Inspect active environment variables through the Terminal User Interface (TUI). The src/tui/managers/env_manager.py module provides a runtime view of parsed configuration, confirming that variables from your .env file were correctly loaded and applied according to the logic in src/config/settings.py.

Summary

  • OpenRAG configuration is strictly environment-driven via variables defined in src/config/settings.py (lines 22-78)
  • The application loads .env files automatically using python-dotenv with fallback to repository root (lines 17-19)
  • Type-safe parsing occurs through src/utils/env_utils.py helpers (get_env_int, get_env_float)
  • src/config/config_manager.py merges YAML files with environment values for hybrid configuration
  • All sensitive credentials, LLM providers, and vector store connections are controlled through environment variables with no hardcoded defaults for security-critical settings
  • Reference .env.example in the repository root for the canonical list of supported variables

Frequently Asked Questions

Does OpenRAG support file-based configuration instead of environment variables?

Yes. While environment variables are the primary mechanism, src/config/config_manager.py loads an optional openrag.yaml file and merges it with environment values. Environment variables take precedence over YAML settings, allowing you to mix both approaches when you configure OpenRAG using environment variables as the override layer.

What happens if I omit required API keys like OPENAI_API_KEY?

The application will refuse to initialize the respective provider. In src/config/settings.py, the constants are imported directly by client factories in src/main.py; missing required keys cause the provider instantiation to fail gracefully with a clear error message rather than starting with invalid credentials.

How do I change configuration without restarting the container?

You cannot change configuration without a restart. OpenRAG reads all environment variables at startup in settings.py (lines 17-19) and stores them as module-level constants. Changes to the .env file require a container restart or process reload to take effect, as the values are not re-parsed at runtime.

Where can I find the complete list of supported environment variables?

The .env.example file in the repository root contains the canonical documentation of every supported variable, its purpose, and suggested defaults. This file serves as the authoritative reference for the entire configuration surface area implemented in src/config/settings.py.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →