# How to Upload and Manage Documents Using the OpenRAG API

> Easily upload manage and delete documents using the OpenRAG API. Learn how to ingest files monitor async processing and remove documents efficiently.

- Repository: [Langflow/openrag](https://github.com/langflow-ai/openrag)
- Tags: how-to-guide
- Published: 2026-03-13

---

**Upload documents to OpenRAG via `POST /api/v1/documents/ingest`, monitor async processing via `GET /api/v1/tasks/{task_id}`, and delete documents via `DELETE /api/v1/documents` using filename-based OpenSearch queries.**

The `langflow-ai/openrag` repository exposes a public **v1 REST API** for document lifecycle management. Understanding how to upload and manage documents using the OpenRAG API enables you to programmatically ingest files, track embedding generation, and maintain your knowledge base through standard HTTP endpoints or the official Python SDK.

## Document Ingestion Workflow

The ingestion pipeline in [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py) handles multipart file uploads and delegates processing to specialized routers based on your deployment configuration.

### The Ingestion Endpoint

The primary entry point is the `ingest_endpoint` function in [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py). This endpoint accepts **multipart/form-data** requests with a `file` field and immediately delegates to the `upload_ingest_router` function.

```python

# src/api/v1/documents.py

async def ingest_endpoint(...):
    return await upload_ingest_router(...)

```

The endpoint supports form values including `delete_after_ingest`, `replace_duplicates`, and `create_filter`, which control post-upload behavior.

### Routing Logic: Classic vs. Langflow Pipelines

The `upload_ingest_router` in [`src/api/router.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/router.py) automatically selects between two processing modes based on the `DISABLE_INGEST_WITH_LANGFLOW` flag defined in [`src/config/settings.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py):

- **Classic Mode** (`DISABLE_INGEST_WITH_LANGFLOW=true`): Directly stores files via `api.upload.upload`, processing only the first uploaded file immediately.
- **Langflow Mode** (`DISABLE_INGEST_WITH_LANGFLOW=false`): Writes files to temporary OS storage and creates asynchronous **Langflow upload tasks** via `task_service.create_langflow_upload_task`.

In Langflow mode, the router normalizes boolean flags and returns a task identifier immediately while chunking and embedding occur asynchronously:

```json
{
  "task_id": "c6f3e2d7-8a4b-4f1a-9c3b-2e5f6a7b8c9d",
  "message": "Langflow upload task created for 1 file(s)",
  "file_count": 1,
  "filename": "report.pdf"
}

```

### Task Creation and Execution

Within `_langflow_upload_ingest_task`, the system:

1. Writes uploaded files to the OS temp directory.
2. Invokes `task_service.create_langflow_upload_task` with `user_id`, `file_paths`, original filenames, Langflow file service, session manager, JWT token, and ingestion flags.
3. Executes asynchronous chunking, embedding generation, and optional knowledge filter creation in the background.

## Monitoring Ingestion Status

Poll the task status endpoint to track document processing completion.

### Checking Task Status

Query `GET /v1/tasks/{task_id}` to retrieve the current state. The `task_status_endpoint` in [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py) queries the `TaskService` and returns:

```python

# src/api/v1/documents.py (task_status_endpoint)

task_status = task_service.get_task_status(user.user_id, task_id)

```

The response includes `status` values (`pending`, `running`, `completed`, `failed`) and error messages if processing fails.

## Deleting Documents

Remove documents from the knowledge base using filename-based deletion.

### Filename-Based Deletion

The `delete_document_endpoint` in [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py) constructs an OpenSearch **delete-by-query** request using `build_filename_delete_body` from [`src/utils/opensearch_queries.py`](https://github.com/langflow-ai/openrag/blob/main/src/utils/opensearch_queries.py):

```python

# src/api/v1/documents.py

from utils.opensearch_queries import build_filename_delete_body
result = await opensearch_client.delete_by_query(
    index=get_index_name(),
    body=delete_query,
    conflicts="proceed",
)

```

This deletes all chunks where the `filename` field matches the supplied name, returning the count of deleted chunks.

## Working with the Python SDK

The `openrag_sdk` package in [`sdks/python/openrag_sdk/documents.py`](https://github.com/langflow-ai/openrag/blob/main/sdks/python/openrag_sdk/documents.py) wraps the REST API with async convenience methods.

### SDK Implementation Examples

**Ingest with automatic polling:**

```python
from openrag_sdk.client import OpenRAGClient

client = OpenRAGClient(base_url="http://localhost:8000", api_key="YOUR_API_KEY")

# Wait for completion automatically

result = await client.documents.ingest(
    file_path="reports/annual.pdf",
    wait=True
)

```

**Manual task polling:**

```python

# Start ingestion without waiting

task = await client.documents.ingest(
    file_path="reports/annual.pdf", 
    wait=False
)

# Poll manually

while True:
    status = await client.documents.get_task_status(task.task_id)
    if status.status in ("completed", "failed"):
        break
    await asyncio.sleep(2)

```

**Delete documents:**

```python
delete_resp = await client.documents.delete(filename="annual.pdf")
print(f"Deleted {delete_resp.deleted_chunks} chunks")

```

The SDK handles multipart encoding, JSON serialization, and automatic retry logic for the three core endpoints.

## Summary

- **Upload documents** via `POST /api/v1/documents/ingest` to create an asynchronous Langflow task that handles chunking and embedding.
- **Track progress** by polling `GET /v1/tasks/{task_id}` until status shows `completed` or `failed`.
- **Remove documents** via `DELETE /api/v1/documents` using the filename to target all associated chunks in OpenSearch.
- **Configure behavior** through the `DISABLE_INGEST_WITH_LANGFLOW` flag in [`src/config/settings.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/settings.py) to toggle between direct upload and task-based processing.
- **Use the Python SDK** ([`sdks/python/openrag_sdk/documents.py`](https://github.com/langflow-ai/openrag/blob/main/sdks/python/openrag_sdk/documents.py)) to simplify async operations with built-in polling and error handling.

## Frequently Asked Questions

### What file formats does the OpenRAG API support for ingestion?

The OpenRAG API supports standard document formats including PDF, TXT, and Markdown through the multipart upload endpoint. According to the source code in [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py), the system processes files via the Langflow pipeline or classic uploader, extracting text content for chunking regardless of the original format.

### How long does document ingestion typically take?

Ingestion duration depends on file size and the processing mode defined by `DISABLE_INGEST_WITH_LANGFLOW`. When using the Langflow task pipeline (default), the API returns a `task_id` immediately while chunking and embedding occur asynchronously. Large documents may take several minutes to reach `completed` status, which you can monitor via the task status endpoint.

### Can I prevent duplicate documents in the knowledge base?

Yes. The ingestion endpoint accepts a `replace_duplicates` form field that the router passes to the task service. When enabled, the system handles duplicate detection during the Langflow upload task execution, ensuring existing chunks for the same filename are managed according to your configured duplicate handling strategy.

### Is there an official SDK for languages other than Python?

Currently, the repository only includes an official Python SDK located in `sdks/python/openrag_sdk/`. The SDK provides async wrappers for document management, but the REST API follows standard HTTP conventions, allowing you to implement clients in other languages using the endpoint specifications in [`src/api/v1/documents.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/documents.py).