How to Ingest Documents Using the OpenRAG Public API v1

The OpenRAG Public API v1 exposes a single POST /v1/documents/ingest endpoint that accepts multipart file uploads and routes them through either an asynchronous Langflow-powered pipeline or a synchronous traditional upload based on the DISABLE_INGEST_WITH_LANGFLOW configuration flag.

The langflow-ai/openrag repository implements a dual-path ingestion system that processes documents through three distinct architectural layers. When you call the OpenRAG Public API v1 ingest endpoint, the system automatically handles file storage, chunking, and vector indexing while providing immediate feedback through asynchronous task IDs or synchronous confirmation responses.

OpenRAG Public API v1 Endpoint Overview

The public ingestion interface consists of one multipart endpoint:


POST /v1/documents/ingest

This endpoint accepts one or more files along with metadata parameters that control processing behavior. In src/api/v1/documents.py, the ingest_endpoint function parses incoming multipart requests and forwards all parameters—including files, session IDs, and processing flags—to the internal routing layer.

The system supports two operational modes:

  • Langflow-powered upload-ingest (default): Creates background tasks for asynchronous processing
  • Traditional upload: Synchronous direct storage into the OpenRAG knowledge base

Architecture and Routing Logic

The ingestion flow passes through three distinct layers that determine how documents are processed and stored.

FastAPI Route Layer

In src/api/v1/documents.py (lines 31-66), the ingest_endpoint function acts as the public interface. It accepts multipart form data containing files and optional configuration parameters, then immediately delegates to the internal router without performing business logic. This separation ensures the public API remains stable while internal implementations evolve.

Internal Router Decision Point

The upload_ingest_router function in src/api/router.py (lines 25-74) implements the routing logic. This component checks the DISABLE_INGEST_WITH_LANGFLOW setting from src/config/settings.py to determine the processing path:

  • When false (default): Routes to TaskService.create_langflow_upload_task for asynchronous Langflow processing
  • When true: Routes to the legacy api.upload.upload function for immediate synchronous storage

The router also handles boolean string parsing (converting "true"/"false" strings to Python booleans) and constructs the appropriate JSON response—either a 202 Accepted with a task ID or a 200 OK success confirmation.

Task Service Layer

For Langflow-based ingestion, src/services/task_service.py (lines 28-48) manages asynchronous execution. The create_langflow_upload_task function:

  1. Writes uploaded files to temporary storage
  2. Spawns a background job that calls src/services/langflow_file_service.py (lines 233-380)
  3. Triggers the Langflow ingestion flow via the Langflow API
  4. Optionally deletes the source file from Langflow after successful processing
  5. Indexes extracted chunks into OpenSearch

The service returns a unique task_id that clients poll via GET /v1/tasks/{task_id} to track progress.

Request Parameters for Document Ingestion

The OpenRAG Public API v1 accepts the following multipart form fields:

Field Type Description
file Multipart (one or many) The document(s) to ingest (PDF, TXT, etc.)
session_id Optional string Langflow session identifier for flow context
settings Optional JSON string Langflow flow settings (e.g., {"chunk_size": 500})
tweaks Optional JSON string Langflow flow tweaks (e.g., {"model": "gpt-4"})
delete_after_ingest Optional "true"/"false" Removes file from Langflow after successful ingestion
replace_duplicates Optional "true"/"false" Overwrites existing chunks with matching hashes
create_filter Optional "true"/"false" Automatically creates a knowledge-filter for the document

All optional parameters are forwarded unchanged to the underlying processing pipeline.

Code Examples

Ingest a Single File with cURL

Submit a PDF to the Langflow-powered pipeline (default configuration):

curl -X POST "http://localhost:8000/v1/documents/ingest" \
  -H "Authorization: Bearer <API_KEY>" \
  -F "file=@/path/to/report.pdf" \
  -F "delete_after_ingest=true" \
  -F "replace_duplicates=true" \
  -F "create_filter=false"

Expected Response (202 Accepted):

{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "message": "Langflow upload task created for 1 file(s)",
  "file_count": 1,
  "create_filter": false,
  "filename": "report.pdf"
}

Poll Asynchronous Task Status

When using the Langflow path, query the task endpoint to monitor ingestion progress:

curl -X GET "http://localhost:8000/v1/tasks/c4f7e3b2-9d5a-4c1a-bcde-1234567890ab" \
  -H "Authorization: Bearer <API_KEY>"

Running Status:

{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "status": "in_progress",
  "progress": 45,
  "message": "Uploading files to Langflow..."
}

Completed Status:

{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "status": "completed",
  "message": "Document ingestion finished and indexed."
}

Failed Status:

{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "status": "failed",
  "error": "AuthenticationException: Invalid Langflow token"
}

Ingest Multiple Files with Python

Use the requests library to upload mixed document types with custom Langflow settings:

import requests

api_key = "<API_KEY>"
url = "http://localhost:8000/v1/documents/ingest"

files = [
    ("file", ("doc1.pdf", open("doc1.pdf", "rb"), "application/pdf")),
    ("file", ("doc2.txt", open("doc2.txt", "rb"), "text/plain")),
]

data = {
    "delete_after_ingest": "true",
    "replace_duplicates": "true",
    "create_filter": "false",
    "settings": '{"chunk_size": 500}',
    "tweaks": '{"model": "gpt-4"}',
}

headers = {"Authorization": f"Bearer {api_key}"}

resp = requests.post(url, files=files, data=data, headers=headers)
print(resp.json())

Disable Langflow-Based Ingestion

To use the traditional synchronous upload path instead of the default Langflow pipeline, set the environment variable before starting the server:

export DISABLE_INGEST_WITH_LANGFLOW=true
uvicorn src.main:app --host 0.0.0.0 --port 8000

With this configuration, the same POST /v1/documents/ingest request returns a 200 OK response immediately after storing the file directly in the OpenRAG knowledge base, without creating a background task.

Source Code Reference

The following files in langflow-ai/openrag implement the complete ingestion pipeline:

Summary

  • The OpenRAG Public API v1 provides a single POST /v1/documents/ingest endpoint for document ingestion
  • Two processing paths exist: Langflow-powered asynchronous (default) and traditional synchronous (configured via DISABLE_INGEST_WITH_LANGFLOW)
  • Asynchronous tasks return a 202 Accepted response with a task_id for polling via GET /v1/tasks/{task_id}
  • Multipart form data supports multiple files and optional parameters including delete_after_ingest, replace_duplicates, and create_filter
  • Source files in src/api/v1/documents.py, src/api/router.py, and src/services/task_service.py handle the complete request lifecycle from HTTP reception to vector indexing

Frequently Asked Questions

What is the difference between the Langflow and traditional upload paths?

The Langflow path (default) processes documents asynchronously through Langflow flows for advanced chunking and extraction, returning a task ID for polling. The traditional path stores files directly and synchronously into the OpenRAG knowledge base, returning immediate success without background processing. The DISABLE_INGEST_WITH_LANGFLOW environment variable in src/config/settings.py toggles between these modes.

How do I check if document ingestion succeeded?

When using the Langflow-based path, poll the task status endpoint with GET /v1/tasks/{task_id}. The response includes a status field with values in_progress, completed, or failed, along with progress percentages or error messages. Successful completion indicates the document has been chunked and indexed in OpenSearch.

Can I ingest multiple file types in a single request?

Yes. The POST /v1/documents/ingest endpoint accepts multiple file parameters in a single multipart request. You can mix PDFs, text files, and other supported formats. The file_count field in the response confirms how many files were queued for processing.

Where is the DISABLE_INGEST_WITH_LANGFLOW setting defined?

The DISABLE_INGEST_WITH_LANGFLOW boolean flag is defined in src/config/settings.py and exposed through the application configuration object. The router in src/api/router.py checks this value at runtime to determine whether to call TaskService.create_langflow_upload_task or the legacy upload function.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →