How to Ingest Documents Using the OpenRAG Public API v1
The OpenRAG Public API v1 exposes a single POST /v1/documents/ingest endpoint that accepts multipart file uploads and routes them through either an asynchronous Langflow-powered pipeline or a synchronous traditional upload based on the DISABLE_INGEST_WITH_LANGFLOW configuration flag.
The langflow-ai/openrag repository implements a dual-path ingestion system that processes documents through three distinct architectural layers. When you call the OpenRAG Public API v1 ingest endpoint, the system automatically handles file storage, chunking, and vector indexing while providing immediate feedback through asynchronous task IDs or synchronous confirmation responses.
OpenRAG Public API v1 Endpoint Overview
The public ingestion interface consists of one multipart endpoint:
POST /v1/documents/ingest
This endpoint accepts one or more files along with metadata parameters that control processing behavior. In src/api/v1/documents.py, the ingest_endpoint function parses incoming multipart requests and forwards all parameters—including files, session IDs, and processing flags—to the internal routing layer.
The system supports two operational modes:
- Langflow-powered upload-ingest (default): Creates background tasks for asynchronous processing
- Traditional upload: Synchronous direct storage into the OpenRAG knowledge base
Architecture and Routing Logic
The ingestion flow passes through three distinct layers that determine how documents are processed and stored.
FastAPI Route Layer
In src/api/v1/documents.py (lines 31-66), the ingest_endpoint function acts as the public interface. It accepts multipart form data containing files and optional configuration parameters, then immediately delegates to the internal router without performing business logic. This separation ensures the public API remains stable while internal implementations evolve.
Internal Router Decision Point
The upload_ingest_router function in src/api/router.py (lines 25-74) implements the routing logic. This component checks the DISABLE_INGEST_WITH_LANGFLOW setting from src/config/settings.py to determine the processing path:
- When
false(default): Routes toTaskService.create_langflow_upload_taskfor asynchronous Langflow processing - When
true: Routes to the legacyapi.upload.uploadfunction for immediate synchronous storage
The router also handles boolean string parsing (converting "true"/"false" strings to Python booleans) and constructs the appropriate JSON response—either a 202 Accepted with a task ID or a 200 OK success confirmation.
Task Service Layer
For Langflow-based ingestion, src/services/task_service.py (lines 28-48) manages asynchronous execution. The create_langflow_upload_task function:
- Writes uploaded files to temporary storage
- Spawns a background job that calls
src/services/langflow_file_service.py(lines 233-380) - Triggers the Langflow ingestion flow via the Langflow API
- Optionally deletes the source file from Langflow after successful processing
- Indexes extracted chunks into OpenSearch
The service returns a unique task_id that clients poll via GET /v1/tasks/{task_id} to track progress.
Request Parameters for Document Ingestion
The OpenRAG Public API v1 accepts the following multipart form fields:
| Field | Type | Description |
|---|---|---|
file |
Multipart (one or many) | The document(s) to ingest (PDF, TXT, etc.) |
session_id |
Optional string | Langflow session identifier for flow context |
settings |
Optional JSON string | Langflow flow settings (e.g., {"chunk_size": 500}) |
tweaks |
Optional JSON string | Langflow flow tweaks (e.g., {"model": "gpt-4"}) |
delete_after_ingest |
Optional "true"/"false" |
Removes file from Langflow after successful ingestion |
replace_duplicates |
Optional "true"/"false" |
Overwrites existing chunks with matching hashes |
create_filter |
Optional "true"/"false" |
Automatically creates a knowledge-filter for the document |
All optional parameters are forwarded unchanged to the underlying processing pipeline.
Code Examples
Ingest a Single File with cURL
Submit a PDF to the Langflow-powered pipeline (default configuration):
curl -X POST "http://localhost:8000/v1/documents/ingest" \
-H "Authorization: Bearer <API_KEY>" \
-F "file=@/path/to/report.pdf" \
-F "delete_after_ingest=true" \
-F "replace_duplicates=true" \
-F "create_filter=false"
Expected Response (202 Accepted):
{
"task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
"message": "Langflow upload task created for 1 file(s)",
"file_count": 1,
"create_filter": false,
"filename": "report.pdf"
}
Poll Asynchronous Task Status
When using the Langflow path, query the task endpoint to monitor ingestion progress:
curl -X GET "http://localhost:8000/v1/tasks/c4f7e3b2-9d5a-4c1a-bcde-1234567890ab" \
-H "Authorization: Bearer <API_KEY>"
Running Status:
{
"task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
"status": "in_progress",
"progress": 45,
"message": "Uploading files to Langflow..."
}
Completed Status:
{
"task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
"status": "completed",
"message": "Document ingestion finished and indexed."
}
Failed Status:
{
"task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
"status": "failed",
"error": "AuthenticationException: Invalid Langflow token"
}
Ingest Multiple Files with Python
Use the requests library to upload mixed document types with custom Langflow settings:
import requests
api_key = "<API_KEY>"
url = "http://localhost:8000/v1/documents/ingest"
files = [
("file", ("doc1.pdf", open("doc1.pdf", "rb"), "application/pdf")),
("file", ("doc2.txt", open("doc2.txt", "rb"), "text/plain")),
]
data = {
"delete_after_ingest": "true",
"replace_duplicates": "true",
"create_filter": "false",
"settings": '{"chunk_size": 500}',
"tweaks": '{"model": "gpt-4"}',
}
headers = {"Authorization": f"Bearer {api_key}"}
resp = requests.post(url, files=files, data=data, headers=headers)
print(resp.json())
Disable Langflow-Based Ingestion
To use the traditional synchronous upload path instead of the default Langflow pipeline, set the environment variable before starting the server:
export DISABLE_INGEST_WITH_LANGFLOW=true
uvicorn src.main:app --host 0.0.0.0 --port 8000
With this configuration, the same POST /v1/documents/ingest request returns a 200 OK response immediately after storing the file directly in the OpenRAG knowledge base, without creating a background task.
Source Code Reference
The following files in langflow-ai/openrag implement the complete ingestion pipeline:
src/api/v1/documents.py: Public API v1 endpoint that receives multipart requests and forwards parameters to the routersrc/api/router.py: Core routing logic that selects between Langflow and traditional upload paths based on configurationsrc/services/task_service.py: Asynchronous task management and background job creationsrc/services/langflow_file_service.py: Langflow API client for file uploads and flow executionsrc/services/flows_service.py: Mapping of logical flow names (e.g.,ingest,url_ingest) to Langflow flow IDssrc/config/settings.py: Configuration management including theDISABLE_INGEST_WITH_LANGFLOWboolean flag
Summary
- The OpenRAG Public API v1 provides a single
POST /v1/documents/ingestendpoint for document ingestion - Two processing paths exist: Langflow-powered asynchronous (default) and traditional synchronous (configured via
DISABLE_INGEST_WITH_LANGFLOW) - Asynchronous tasks return a 202 Accepted response with a
task_idfor polling viaGET /v1/tasks/{task_id} - Multipart form data supports multiple files and optional parameters including
delete_after_ingest,replace_duplicates, andcreate_filter - Source files in
src/api/v1/documents.py,src/api/router.py, andsrc/services/task_service.pyhandle the complete request lifecycle from HTTP reception to vector indexing
Frequently Asked Questions
What is the difference between the Langflow and traditional upload paths?
The Langflow path (default) processes documents asynchronously through Langflow flows for advanced chunking and extraction, returning a task ID for polling. The traditional path stores files directly and synchronously into the OpenRAG knowledge base, returning immediate success without background processing. The DISABLE_INGEST_WITH_LANGFLOW environment variable in src/config/settings.py toggles between these modes.
How do I check if document ingestion succeeded?
When using the Langflow-based path, poll the task status endpoint with GET /v1/tasks/{task_id}. The response includes a status field with values in_progress, completed, or failed, along with progress percentages or error messages. Successful completion indicates the document has been chunked and indexed in OpenSearch.
Can I ingest multiple file types in a single request?
Yes. The POST /v1/documents/ingest endpoint accepts multiple file parameters in a single multipart request. You can mix PDFs, text files, and other supported formats. The file_count field in the response confirms how many files were queued for processing.
Where is the DISABLE_INGEST_WITH_LANGFLOW setting defined?
The DISABLE_INGEST_WITH_LANGFLOW boolean flag is defined in src/config/settings.py and exposed through the application configuration object. The router in src/api/router.py checks this value at runtime to determine whether to call TaskService.create_langflow_upload_task or the legacy upload function.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →