# How to Set Up the GraphRAG Agent Project: Complete Installation Guide

> Learn how to set up the GraphRAG Agent project with our complete installation guide. Follow simple steps to clone, configure, and launch your knowledge graph application.

- Repository: [GLK/graph-rag-agent](https://github.com/1517005260/graph-rag-agent)
- Tags: getting-started
- Published: 2026-02-22

---

**To set up the GraphRAG Agent project, clone the repository, install Python dependencies and system libraries, configure your Neo4j and LLM credentials in a `.env` file, build the initial knowledge graph from your documents, and launch both the FastAPI backend and Streamlit frontend services.**

The GraphRAG Agent is a modular Python monorepo that combines retrieval-augmented generation (RAG) pipelines with Neo4j graph databases. This guide walks through deploying the full stack—from ingesting documents into the knowledge graph to running the conversational web interface—using the exact file structure and commands found in the `1517005260/graph-rag-agent` repository.

## Prerequisites

Before installation, ensure you have **Python 3.8+**, a running **Neo4j 5.x** instance, and OS-level dependencies for document processing. On Ubuntu, install the required system packages:

```bash
sudo apt-get install python-dev-is-python3 libxml2-dev libxslt1-dev antiword unrtf poppler-utils

```

These libraries enable PDF and DOCX parsing used by the `FileReader` class in [`graphrag_agent/pipelines/ingestion/document_processor.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/pipelines/ingestion/document_processor.py).

## Step-by-Step Installation

### 1. Clone the Repository and Create a Virtual Environment

```bash
git clone https://github.com/1517005260/graph-rag-agent.git
cd graph-rag-agent
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

```

### 2. Install Python Dependencies

Install the packages listed in [`requirements.txt`](https://github.com/1517005260/graph-rag-agent/blob/main/requirements.txt), which includes FastAPI, Streamlit, the Neo4j Python driver, and LLM SDKs:

```bash
pip install -r requirements.txt

```

**Windows users:** If you encounter Torch DLL errors, downgrade PyTorch as specified in the documentation:

```bash
pip install torch==2.8.0

```

### 3. Configure Environment Variables

Copy the example environment file and edit it with your Neo4j connection details and LLM API keys:

```bash
cp .env.example .env

```

Open `.env` and set these critical variables:

- **`NEO4J_URI`**: Bolt URL (e.g., `bolt://localhost:7687`)
- **`NEO4J_USER`** and **`NEO4J_PASSWORD`**: Database authentication
- **`OPENAI_API_KEY`**: Your LLM provider token
- **`CACHE_ROOT`**: Directory for vector stores and intermediate caches

The application loads these settings through [`server/server_config/database.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/database.py) and [`graphrag_agent/config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/config/settings.py) at runtime.

## Build the Knowledge Graph

The GraphRAG Agent requires an initial ingestion of documents to create the Neo4j knowledge graph. The build pipeline orchestrates four stages defined in [`graphrag_agent/integrations/build/main.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/main.py):

1. **File reading** via `FileReader` (handles PDFs, DOCX, TXT)
2. **Chunking** via `ChineseTextChunker` or `TextChunker`
3. **Embedding** using the configured LLM embedder
4. **Neo4j import** via `KnowledgeGraphBuilder` (batched Cypher in [`neo4j_batch.py`](https://github.com/1517005260/graph-rag-agent/blob/main/neo4j_batch.py))

Run the full pipeline:

```bash
python -m graphrag_agent.integrations.build.main --data-dir ./files

```

Replace `./files` with the path to your source documents. This command uses the `KnowledgeGraphProcessor` class to populate your graph database with entities and relationships.

## Start the Application Services

### Launch the FastAPI Backend

The backend service exposes REST endpoints for chat, source retrieval, and feedback under [`server/routers/chat.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/routers/chat.py) and [`server/routers/source.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/routers/source.py). Start the Uvicorn server defined in [`server/main.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/main.py):

```bash
uvicorn server.main:app --host 0.0.0.0 --port 8000

```

All configurable options (host, port, logging) are centralized in [`server/server_config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/settings.py).

### Launch the Streamlit Frontend

In a new terminal window (with the virtual environment activated), start the UI:

```bash
streamlit run frontend/app.py

```

The frontend communicates with the backend via helper functions in [`frontend/utils/api.py`](https://github.com/1517005260/graph-rag-agent/blob/main/frontend/utils/api.py). Open the displayed URL (typically `http://localhost:8501`) to interact with the RAG agent.

## Verify the Installation

Run the unit test suite to validate the cache system, search tools, and multi-agent flows:

```bash
python -m unittest discover test -v

```

For end-to-end validation, execute the Hotpot evaluation script at [`test/hotpot_multi_agent_eval.py`](https://github.com/1517005260/graph-rag-agent/blob/main/test/hotpot_multi_agent_eval.py), which spins up the complete stack.

## Incremental Document Updates

After initial setup, add new documents without rebuilding the entire graph using the incremental updater:

```bash
python -m graphrag_agent.integrations.build.incremental_update --watch ./files

```

This command spawns the `IncrementalUpdateScheduler` background thread, which monitors the directory for changes and triggers `IncrementalGraphUpdater` to merge new entities into Neo4j.

## Summary

- **GraphRAG Agent** is organized into four layers: Core runtime (`graphrag_agent/`), Backend API (`server/`), Frontend UI (`frontend/`), and Build scripts (`graphrag_agent/integrations/build/`).
- **System dependencies** (Poppler, XML libraries) are required for document parsing on Linux systems.
- **Configuration** is managed through environment variables read by [`server/server_config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/settings.py) and database modules.
- **Initial ingestion** uses `graphrag_agent.integrations.build.main` to process files through chunking, embedding, and graph construction.
- **Services** must run simultaneously: FastAPI backend (`uvicorn server.main:app`) and Streamlit frontend (`streamlit run frontend/app.py`).

## Frequently Asked Questions

### What are the system requirements for GraphRAG Agent?

You need Python 3.8 or higher, a Neo4j 5.x database instance with the APOC plugin, and approximately 4GB of RAM for the embedding models. The OS requires `poppler-utils` and `libxml2-dev` for PDF processing. GPU acceleration is optional but recommended for faster embedding generation when processing large document collections.

### How do I fix the Torch DLL error on Windows?

If you encounter DLL load failures when importing PyTorch modules, downgrade to the stable 2.8.0 release by running `pip install torch==2.8.0`. This resolves compatibility issues with specific Windows Visual C++ redistributable versions that conflict with the default torch installation in [`requirements.txt`](https://github.com/1517005260/graph-rag-agent/blob/main/requirements.txt).

### Can I run GraphRAG Agent without Neo4j?

No. The `KnowledgeGraphBuilder` class in [`graphrag_agent/integrations/build/build_graph.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/build_graph.py) and the search tools in [`graphrag_agent/search/tool/base.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/search/tool/base.py) depend on Neo4j as the primary graph store. The application uses Cypher queries to traverse entity relationships during the retrieval phase of the RAG pipeline.

### How do I add new documents after the initial setup?

Use the incremental update command `python -m graphrag_agent.integrations.build.incremental_update --watch ./files` instead of rebuilding from scratch. The `IncrementalGraphUpdater` class processes only new or modified files, updating the Neo4j graph and vector embeddings without reprocessing your entire document corpus.