How to Configure OAuth for OpenRAG: Complete Setup for Google Drive and Microsoft Graph
OpenRAG configures OAuth through declarative environment variables managed by the EnvManager, which validates credentials and injects them into connector classes for Google Drive and Microsoft Graph authentication.
Configuring OAuth for the langflow-ai/openrag repository enables secure authentication with external document sources like Google Drive, OneDrive, and SharePoint. The system uses a centralized EnvManager to load and validate OAuth credentials from environment variables, ensuring that client secrets and tokens flow securely to the appropriate connector classes without hardcoded values.
OAuth Configuration Architecture in OpenRAG
The OpenRAG codebase implements a layered validation system that separates credential storage from connector implementation. At startup, the EnvManager class (located in src/tui/managers/env_manager.py) reads a .env file from the default path ~/.openrag/.env and populates an EnvConfig dataclass.
The environment variable mapping occurs in EnvManager._env_attr_map() (lines 81‑86), which defines the relationship between external OAuth providers and internal configuration attributes. For Google authentication, the system validates the client ID using validate_google_oauth_client_id() (lines 87‑92), ensuring the value ends with .apps.googleusercontent.com before accepting it.
Connector classes consume these values through class-level constants. In src/connectors/google_drive/connector.py (lines 69‑72), the GoogleDriveConnector defines CLIENT_ID_ENV_VAR and CLIENT_SECRET_ENV_VAR, which the AuthService queries during the OAuth initialization flow.
Required Environment Variables for OpenRAG OAuth
OpenRAG requires distinct credential pairs for each cloud provider. The EnvManager validates these immediately upon loading, throwing a RuntimeError if any required value is missing or malformed.
- Google Drive and UI Authentication:
GOOGLE_OAUTH_CLIENT_IDandGOOGLE_OAUTH_CLIENT_SECRET - Microsoft Graph (OneDrive/SharePoint):
MICROSOFT_GRAPH_OAUTH_CLIENT_IDandMICROSOFT_GRAPH_OAUTH_CLIENT_SECRET - Webhook Configuration (Optional):
WEBHOOK_BASE_URL— required only when exposing a public callback endpoint for OAuth redirects
The Google client ID undergoes strict format validation in src/tui/utils/validation.py, while Microsoft credentials use generic non-empty validation. Missing values trigger early failure in AuthService.init_oauth() (lines 31‑38) with descriptive error messages.
Step-by-Step Google Drive OAuth Setup
1. Create Google Cloud Credentials
Navigate to the Google Cloud Console → APIs & Services → Credentials → Create OAuth client ID (Web application type). Configure the Authorized redirect URIs to match your OpenRAG instance, typically http://localhost:3000/api/oauth/google/callback for local development or https://your-domain.com/api/oauth/google/callback for production.
Copy the generated Client ID and Client secret.
2. Configure the Environment File
Create or edit ~/.openrag/.env with the following values:
# OpenRAG Google OAuth configuration
GOOGLE_OAUTH_CLIENT_ID=1234567890-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com
GOOGLE_OAUTH_CLIENT_SECRET=YOUR_GOOGLE_CLIENT_SECRET
Restart the OpenRAG server to trigger EnvManager reloading. The system will warn you if the client ID fails the .apps.googleusercontent.com suffix check.
3. Initiate the Connection Flow
When users click Add Google Drive in the TUI, AuthService.init_oauth() extracts the environment variable names from GoogleDriveConnector.CLIENT_ID_ENV_VAR and reads the actual values using os.getenv(). The service builds an OAuth configuration containing the client ID, scopes, and redirect URI, then generates an authorization URL via GoogleDriveOAuth.create_authorization_url().
After the user grants consent, Google redirects to your callback endpoint with an authorization code. AuthService.handle_oauth_callback() validates the code against reuse and delegates token exchange to the connector's OAuth wrapper. Tokens persist automatically to data/google_drive_<uuid>.json for subsequent API calls.
Configuring Microsoft Graph OAuth for OneDrive and SharePoint
Microsoft Graph authentication follows an identical pattern but uses Azure AD app registrations.
Register a new application in Azure AD → App registrations → New registration. Add a Redirect URI matching your OpenRAG deployment (e.g., http://localhost:3000/api/oauth/microsoft/callback). Grant delegated permissions for Files.Read and Files.Read.All under API permissions, then create a client secret under Certificates & secrets.
Add these credentials to your .env file:
# Microsoft Graph OAuth configuration
MICROSOFT_GRAPH_OAUTH_CLIENT_ID=YOUR_AZURE_APP_ID
MICROSOFT_GRAPH_OAUTH_CLIENT_SECRET=YOUR_AZURE_CLIENT_SECRET
The OneDriveOAuth and SharePointOAuth classes (located in src/connectors/onedrive/oauth.py and src/connectors/sharepoint/oauth.py) read these variables when AuthService initializes a connection for those connector types.
How the OAuth Token Flow Works in OpenRAG
Understanding the internal flow helps debug connection failures. The system implements a seven-stage pipeline:
- Environment Loading:
EnvManagerbuildsEnvConfigfrom~/.openrag/.envand validates formats. - Flow Initialization:
AuthService.init_oauth()receives a connector type (e.g.,"google_drive"), retrieves environment variable names from the connector class, and fetches values viaos.getenv()(lines 31‑34). - Config Construction: The service assembles an OAuth config dict containing endpoints, scopes, and the
redirect_uri(optionally prefixed withWEBHOOK_BASE_URL). - Authorization URL Generation: The concrete OAuth wrapper (e.g.,
GoogleDriveOAuth) generates the URL for the user to visit. - User Consent: The external provider authenticates the user and redirects to the callback with
codeandstateparameters. - Callback Handling:
AuthService.handle_oauth_callback()receives the code, validates state to prevent replay attacks, and callshandle_authorization_callback()on the OAuth wrapper to exchange the code for tokens. - Persistence: The wrapper stores access and refresh tokens in JSON format under the
data/directory, enabling automatic token refresh during subsequent sync operations.
Programmatic OAuth Implementation
For custom integrations or headless deployments, you can trigger OAuth flows programmatically using the AuthService directly.
Initialize Google Drive OAuth via Python
from src.services.auth_service import AuthService
from src.api.session_manager import SessionManager
import asyncio
async def start_google_drive_oauth():
session_mgr = SessionManager()
auth_service = AuthService(session_mgr)
# Must match the URI registered in Google Cloud Console
redirect_uri = "http://localhost:3000/api/oauth/google/callback"
result = await auth_service.init_oauth(
connector_type="google_drive",
purpose="data_source",
connection_name="Production Drive",
redirect_uri=redirect_uri,
user_id="admin"
)
oauth_config = result["oauth_config"]
auth_url = (
f"{oauth_config['authorization_endpoint']}?client_id={oauth_config['client_id']}"
f"&redirect_uri={oauth_config['redirect_uri']}&response_type=code"
f"&scope={' '.join(oauth_config['scopes'])}&access_type=offline&prompt=consent"
)
print(f"Visit this URL to authorize: {auth_url}")
# Store result["connection_id"] to correlate with the callback
Handle OAuth Callback with FastAPI
from fastapi import FastAPI, Request
from src.services.auth_service import AuthService
app = FastAPI()
auth_service = AuthService(...)
@app.get("/api/oauth/google/callback")
async def google_callback(request: Request):
code = request.query_params.get("code")
state = request.query_params.get("state")
connection_id = request.session.get("pending_connection_id")
result = await auth_service.handle_oauth_callback(
connection_id=connection_id,
authorization_code=code,
state=state
)
return {"status": "authenticated", "connection_id": connection_id}
Authenticated Connector Usage
Once tokens are persisted, the connector loads them automatically:
from src.connectors.google_drive.connector import GoogleDriveConnector
cfg = {
"token_file": "data/google_drive_data_source_abcd.json",
"recursive": True
}
gdrive = GoogleDriveConnector(cfg)
await gdrive.oauth.load_credentials()
if not await gdrive.oauth.is_authenticated():
raise RuntimeError("OAuth flow incomplete")
files = await gdrive.list_files()
Summary
- Declarative Configuration: OpenRAG uses
~/.openrag/.envandEnvManagerto load OAuth credentials without code changes. - Validation Layer: Google client IDs must end with
.apps.googleusercontent.com; Microsoft IDs require non-empty validation. - AuthService Orchestration: The
init_oauth()andhandle_oauth_callback()methods insrc/services/auth_service.pymanage the full handshake flow. - Token Persistence: Connectors store refresh tokens in
data/JSON files, enabling long-lived connections without repeated user consent. - Multi-Provider Support: Identical patterns support Google Drive, OneDrive, and SharePoint through environment-specific variable prefixes.
Frequently Asked Questions
Where does OpenRAG store OAuth tokens after authentication?
OpenRAG persists tokens in JSON files under the data/ directory (e.g., data/google_drive_<uuid>.json). The connector-specific OAuth wrappers (such as GoogleDriveOAuth) handle automatic token refresh using these files, ensuring continuous synchronization without requiring users to re-authenticate when access tokens expire.
Why does my Google OAuth configuration fail validation?
The validate_google_oauth_client_id() function in src/tui/utils/validation.py (lines 87‑92) checks that your GOOGLE_OAUTH_CLIENT_ID ends with .apps.googleusercontent.com. If this suffix is missing, the EnvManager flags the configuration as invalid during startup, preventing connection attempts that would fail at Google's authorization endpoint.
Can I use the same Google OAuth credentials for UI login and Google Drive access?
Yes. OpenRAG uses the same GOOGLE_OAUTH_CLIENT_ID and GOOGLE_OAUTH_CLIENT_SECRET environment variables for both protecting the web UI with Google Sign-In and authenticating Google Drive data sources. The AuthService distinguishes the purpose based on the purpose parameter passed to init_oauth(), but both flows consume identical credential pairs from the environment.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →