# How to Ingest Documents Using the OpenRAG Public API v1

> Easily ingest documents with the OpenRAG Public API v1 using its POST /v1/documents/ingest endpoint. Learn how to leverage Langflow pipelines or traditional uploads for efficient document processing. Get started today.

- Repository: [Langflow/openrag](https://github.com/langflow-ai/openrag)
- Tags: how-to-guide
- Published: 2026-03-13

---

**The OpenRAG Public API v1 exposes a single `POST /v1/documents/ingest` endpoint that accepts multipart file uploads and routes them through either an asynchronous Langflow-powered pipeline or a synchronous traditional upload based on the `DISABLE_INGEST_WITH_LANGFLOW` configuration flag.**

The `langflow-ai/openrag` repository implements a dual-path ingestion system that processes documents through three distinct architectural layers. When you call the OpenRAG Public API v1 ingest endpoint, the system automatically handles file storage, chunking, and vector indexing while providing immediate feedback through asynchronous task IDs or synchronous confirmation responses.

## OpenRAG Public API v1 Endpoint Overview

The public ingestion interface consists of one multipart endpoint:

```

POST /v1/documents/ingest

```

This endpoint accepts one or more files along with metadata parameters that control processing behavior. In [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py), the `ingest_endpoint` function parses incoming multipart requests and forwards all parameters—including files, session IDs, and processing flags—to the internal routing layer.

The system supports two operational modes:

- **Langflow-powered upload-ingest** (default): Creates background tasks for asynchronous processing
- **Traditional upload**: Synchronous direct storage into the OpenRAG knowledge base

## Architecture and Routing Logic

The ingestion flow passes through three distinct layers that determine how documents are processed and stored.

### FastAPI Route Layer

In [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py) (lines 31-66), the `ingest_endpoint` function acts as the public interface. It accepts multipart form data containing files and optional configuration parameters, then immediately delegates to the internal router without performing business logic. This separation ensures the public API remains stable while internal implementations evolve.

### Internal Router Decision Point

The `upload_ingest_router` function in [`src/api/router.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/router.py) (lines 25-74) implements the routing logic. This component checks the `DISABLE_INGEST_WITH_LANGFLOW` setting from [`src/config/settings.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py) to determine the processing path:

- **When `false` (default)**: Routes to `TaskService.create_langflow_upload_task` for asynchronous Langflow processing
- **When `true`**: Routes to the legacy `api.upload.upload` function for immediate synchronous storage

The router also handles boolean string parsing (converting `"true"`/`"false"` strings to Python booleans) and constructs the appropriate JSON response—either a 202 Accepted with a task ID or a 200 OK success confirmation.

### Task Service Layer

For Langflow-based ingestion, [`src/services/task_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/task_service.py) (lines 28-48) manages asynchronous execution. The `create_langflow_upload_task` function:

1. Writes uploaded files to temporary storage
2. Spawns a background job that calls [`src/services/langflow_file_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/langflow_file_service.py) (lines 233-380)
3. Triggers the Langflow ingestion flow via the Langflow API
4. Optionally deletes the source file from Langflow after successful processing
5. Indexes extracted chunks into OpenSearch

The service returns a unique `task_id` that clients poll via `GET /v1/tasks/{task_id}` to track progress.

## Request Parameters for Document Ingestion

The OpenRAG Public API v1 accepts the following multipart form fields:

| Field | Type | Description |
|-------|------|-------------|
| `file` | Multipart (one or many) | The document(s) to ingest (PDF, TXT, etc.) |
| `session_id` | Optional string | Langflow session identifier for flow context |
| `settings` | Optional JSON string | Langflow flow settings (e.g., `{"chunk_size": 500}`) |
| `tweaks` | Optional JSON string | Langflow flow tweaks (e.g., `{"model": "gpt-4"}`) |
| `delete_after_ingest` | Optional `"true"`/`"false"` | Removes file from Langflow after successful ingestion |
| `replace_duplicates` | Optional `"true"`/`"false"` | Overwrites existing chunks with matching hashes |
| `create_filter` | Optional `"true"`/`"false"` | Automatically creates a knowledge-filter for the document |

All optional parameters are forwarded unchanged to the underlying processing pipeline.

## Code Examples

### Ingest a Single File with cURL

Submit a PDF to the Langflow-powered pipeline (default configuration):

```bash
curl -X POST "http://localhost:8000/v1/documents/ingest" \
  -H "Authorization: Bearer <API_KEY>" \
  -F "file=@/path/to/report.pdf" \
  -F "delete_after_ingest=true" \
  -F "replace_duplicates=true" \
  -F "create_filter=false"

```

**Expected Response (202 Accepted):**

```json
{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "message": "Langflow upload task created for 1 file(s)",
  "file_count": 1,
  "create_filter": false,
  "filename": "report.pdf"
}

```

### Poll Asynchronous Task Status

When using the Langflow path, query the task endpoint to monitor ingestion progress:

```bash
curl -X GET "http://localhost:8000/v1/tasks/c4f7e3b2-9d5a-4c1a-bcde-1234567890ab" \
  -H "Authorization: Bearer <API_KEY>"

```

**Running Status:**

```json
{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "status": "in_progress",
  "progress": 45,
  "message": "Uploading files to Langflow..."
}

```

**Completed Status:**

```json
{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "status": "completed",
  "message": "Document ingestion finished and indexed."
}

```

**Failed Status:**

```json
{
  "task_id": "c4f7e3b2-9d5a-4c1a-bcde-1234567890ab",
  "status": "failed",
  "error": "AuthenticationException: Invalid Langflow token"
}

```

### Ingest Multiple Files with Python

Use the `requests` library to upload mixed document types with custom Langflow settings:

```python
import requests

api_key = "<API_KEY>"
url = "http://localhost:8000/v1/documents/ingest"

files = [
    ("file", ("doc1.pdf", open("doc1.pdf", "rb"), "application/pdf")),
    ("file", ("doc2.txt", open("doc2.txt", "rb"), "text/plain")),
]

data = {
    "delete_after_ingest": "true",
    "replace_duplicates": "true",
    "create_filter": "false",
    "settings": '{"chunk_size": 500}',
    "tweaks": '{"model": "gpt-4"}',
}

headers = {"Authorization": f"Bearer {api_key}"}

resp = requests.post(url, files=files, data=data, headers=headers)
print(resp.json())

```

### Disable Langflow-Based Ingestion

To use the traditional synchronous upload path instead of the default Langflow pipeline, set the environment variable before starting the server:

```bash
export DISABLE_INGEST_WITH_LANGFLOW=true
uvicorn src.main:app --host 0.0.0.0 --port 8000

```

With this configuration, the same `POST /v1/documents/ingest` request returns a 200 OK response immediately after storing the file directly in the OpenRAG knowledge base, without creating a background task.

## Source Code Reference

The following files in `langflow-ai/openrag` implement the complete ingestion pipeline:

- **[`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py)**: Public API v1 endpoint that receives multipart requests and forwards parameters to the router
- **[`src/api/router.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/router.py)**: Core routing logic that selects between Langflow and traditional upload paths based on configuration
- **[`src/services/task_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/task_service.py)**: Asynchronous task management and background job creation
- **[`src/services/langflow_file_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/langflow_file_service.py)**: Langflow API client for file uploads and flow execution
- **[`src/services/flows_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/flows_service.py)**: Mapping of logical flow names (e.g., `ingest`, `url_ingest`) to Langflow flow IDs
- **[`src/config/settings.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py)**: Configuration management including the `DISABLE_INGEST_WITH_LANGFLOW` boolean flag

## Summary

- The **OpenRAG Public API v1** provides a single `POST /v1/documents/ingest` endpoint for document ingestion
- **Two processing paths** exist: Langflow-powered asynchronous (default) and traditional synchronous (configured via `DISABLE_INGEST_WITH_LANGFLOW`)
- **Asynchronous tasks** return a 202 Accepted response with a `task_id` for polling via `GET /v1/tasks/{task_id}`
- **Multipart form data** supports multiple files and optional parameters including `delete_after_ingest`, `replace_duplicates`, and `create_filter`
- **Source files** in [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py), [`src/api/router.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/router.py), and [`src/services/task_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/task_service.py) handle the complete request lifecycle from HTTP reception to vector indexing

## Frequently Asked Questions

### What is the difference between the Langflow and traditional upload paths?

The **Langflow path** (default) processes documents asynchronously through Langflow flows for advanced chunking and extraction, returning a task ID for polling. The **traditional path** stores files directly and synchronously into the OpenRAG knowledge base, returning immediate success without background processing. The `DISABLE_INGEST_WITH_LANGFLOW` environment variable in [`src/config/settings.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py) toggles between these modes.

### How do I check if document ingestion succeeded?

When using the Langflow-based path, poll the task status endpoint with `GET /v1/tasks/{task_id}`. The response includes a `status` field with values `in_progress`, `completed`, or `failed`, along with progress percentages or error messages. Successful completion indicates the document has been chunked and indexed in OpenSearch.

### Can I ingest multiple file types in a single request?

Yes. The `POST /v1/documents/ingest` endpoint accepts multiple `file` parameters in a single multipart request. You can mix PDFs, text files, and other supported formats. The `file_count` field in the response confirms how many files were queued for processing.

### Where is the DISABLE_INGEST_WITH_LANGFLOW setting defined?

The `DISABLE_INGEST_WITH_LANGFLOW` boolean flag is defined in [`src/config/settings.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py) and exposed through the application configuration object. The router in [`src/api/router.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/router.py) checks this value at runtime to determine whether to call `TaskService.create_langflow_upload_task` or the legacy upload function.