# How to Configure the GraphRAG Agent System: Complete Setup Guide

> Configure the GraphRAG Agent system easily. Follow this setup guide to clone the repo, install dependencies, set up Neo4j and LLM credentials, and run the application.

- Repository: [GLK/graph-rag-agent](https://github.com/1517005260/graph-rag-agent)
- Tags: how-to-guide
- Published: 2026-02-23

---

**To configure the GraphRAG Agent system, clone the repository, install dependencies via [`requirements.txt`](https://github.com/1517005260/graph-rag-agent/blob/main/requirements.txt), configure Neo4j and LLM credentials in `.env`, build the knowledge graph using [`graphrag_agent/integrations/build/main.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/main.py), and launch the FastAPI backend and Streamlit frontend.**

The GraphRAG Agent is a modular Python monorepo that combines document ingestion, Neo4j knowledge graphs, and LLM-powered retrieval-augmented generation. This guide walks you through how to configure the GraphRAG Agent system from source, covering environment setup, dependency installation, and service orchestration.

## Repository Architecture

The system is organized into four logical layers that handle distinct responsibilities:

| Layer | Purpose | Key Files |
|-------|---------|-----------|
| **Core runtime** | RAG pipelines, search tools, and graph-building logic | [`graphrag_agent/search/tool/base.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/search/tool/base.py), [`graphrag_agent/pipelines/ingestion/document_processor.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/pipelines/ingestion/document_processor.py), [`graphrag_agent/graph/structure/struct_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/graph/structure/struct_builder.py) |
| **Backend API** | FastAPI service exposing chat, source retrieval, and feedback endpoints | [`server/main.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/main.py), [`server/routers/chat.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/routers/chat.py), [`server/server_config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/settings.py) |
| **Frontend UI** | Streamlit application communicating with the FastAPI backend | [`frontend/app.py`](https://github.com/1517005260/graph-rag-agent/blob/main/frontend/app.py), [`frontend/utils/api.py`](https://github.com/1517005260/graph-rag-agent/blob/main/frontend/utils/api.py) |
| **Build & Index** | Scripts for document ingestion, chunking, embedding, and Neo4j graph creation | [`graphrag_agent/integrations/build/main.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/main.py), [`graphrag_agent/integrations/build/build_graph.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/build_graph.py) |

## Step 1: Environment Setup

Begin by cloning the repository and creating an isolated Python environment.

```bash
git clone https://github.com/1517005260/graph-rag-agent.git
cd graph-rag-agent

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

```

### System Dependencies

For PDF and document processing, install OS-level packages. On Ubuntu:

```bash
sudo apt-get update
sudo apt-get install -y python-dev-is-python3 libxml2-dev libxslt1-dev antiword unrtf poppler-utils

```

On Windows, if you encounter Torch DLL errors, downgrade PyTorch:

```bash
pip install torch==2.8.0

```

## Step 2: Install Python Dependencies

Install the required packages listed in [`requirements.txt`](https://github.com/1517005260/graph-rag-agent/blob/main/requirements.txt):

```bash
pip install -r requirements.txt

```

This installs FastAPI, Streamlit, the Neo4j driver, LLM SDKs, and document processing libraries.

## Step 3: Configure Environment Variables

Copy the example environment file and populate it with your credentials:

```bash
cp .env.example .env

```

Edit `.env` to include:

| Variable | Description |
|----------|-------------|
| `NEO4J_URI` | Bolt URL (e.g., `bolt://localhost:7687`) |
| `NEO4J_USER` / `NEO4J_PASSWORD` | Neo4j authentication credentials |
| `OPENAI_API_KEY` | API key for your LLM provider (OpenAI, Azure, etc.) |
| `CACHE_ROOT` | Directory for vector stores and intermediate caches |

The application loads these values via [`server/server_config/database.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/database.py) for the backend and [`graphrag_agent/config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/config/settings.py) for the core runtime.

## Step 4: Build the Knowledge Graph

Run the ingestion pipeline to process documents and populate Neo4j:

```bash
python -m graphrag_agent.integrations.build.main --data-dir ./files

```

This command executes the `KnowledgeGraphProcessor` class, which orchestrates:

1. **File reading** via `FileReader` ([`file_reader.py`](https://github.com/1517005260/graph-rag-agent/blob/main/file_reader.py)) handling PDFs, DOCX, and TXT
2. **Chunking** via `ChineseTextChunker` or `TextChunker` to split documents
3. **Embedding** using the LLM embedder defined in [`settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/settings.py)
4. **Graph construction** via `KnowledgeGraphBuilder` ([`build_graph.py`](https://github.com/1517005260/graph-rag-agent/blob/main/build_graph.py)), with batched Cypher execution in [`neo4j_batch.py`](https://github.com/1517005260/graph-rag-agent/blob/main/neo4j_batch.py)

### Incremental Updates

For adding new documents without rebuilding the entire graph:

```bash
python -m graphrag_agent.integrations.build.incremental_update --watch ./files

```

The `IncrementalUpdateScheduler` monitors the directory and triggers `IncrementalGraphUpdater` automatically.

## Step 5: Launch the Services

Start the FastAPI backend:

```bash
uvicorn server.main:app --host 0.0.0.0 --port 8000

```

This exposes endpoints defined in [`server/routers/chat.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/routers/chat.py) (chat), [`server/routers/source.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/routers/source.py) (source retrieval), and feedback collection.

In a separate terminal, launch the Streamlit frontend:

```bash
streamlit run frontend/app.py

```

The UI communicates with the backend via [`frontend/utils/api.py`](https://github.com/1517005260/graph-rag-agent/blob/main/frontend/utils/api.py). Access the application at `http://localhost:8501`.

## Testing and Validation

Verify the installation by running the unit test suite:

```bash
python -m unittest discover test -v

```

Tests cover the cache system, search tools in [`graphrag_agent/search/tool/base.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/search/tool/base.py), and end-to-end multi-agent flows in [`test/hotpot_multi_agent_eval.py`](https://github.com/1517005260/graph-rag-agent/blob/main/test/hotpot_multi_agent_eval.py).

### Minimal API Client Example

```python
import requests

BASE = "http://localhost:8000"

def ask(question: str):
    resp = requests.post(f"{BASE}/chat/", json={"question": question})
    return resp.json()

print(ask("What is the policy for student scholarships?"))

```

## Summary

- The GraphRAG Agent system is a four-layer monorepo consisting of core runtime, FastAPI backend, Streamlit frontend, and build/index scripts.
- Configuration centers on the `.env` file, which supplies Neo4j credentials and LLM API keys read by [`server/server_config/database.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/database.py) and [`graphrag_agent/config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/config/settings.py).
- First-time setup requires running `python -m graphrag_agent.integrations.build.main` to ingest documents into Neo4j via the `KnowledgeGraphBuilder`.
- Services are started with `uvicorn server.main:app` for the backend and `streamlit run frontend/app.py` for the UI.

## Frequently Asked Questions

### What are the minimum hardware requirements for running the GraphRAG Agent?

The system requires sufficient RAM to hold document embeddings during the build phase and a running Neo4j instance (local or remote). For small datasets, 8GB RAM is sufficient, but production deployments with large document corpora benefit from 16GB or more to accommodate the `KnowledgeGraphProcessor` and vector caches defined in [`graphrag_agent/config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/config/settings.py).

### Can I use a different vector database instead of Neo4j?

The current implementation in [`graphrag_agent/graph/structure/struct_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/graph/structure/struct_builder.py) and [`graphrag_agent/integrations/build/build_graph.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/build_graph.py) is optimized for Neo4j as the primary graph store. While the [`graphrag_agent/search/tool/base.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/search/tool/base.py) abstracts search functionality, replacing Neo4j would require modifying the Cypher generation logic in [`neo4j_batch.py`](https://github.com/1517005260/graph-rag-agent/blob/main/neo4j_batch.py) and the connection handling in [`server/server_config/database.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/database.py).

### How do I update the knowledge graph when new documents arrive?

Instead of rebuilding the entire graph, use the incremental update scheduler. Run `python -m graphrag_agent.integrations.build.incremental_update --watch ./files` to spawn the `IncrementalUpdateScheduler`, which monitors the directory and triggers `IncrementalGraphUpdater` to process new files without disrupting existing graph data.

### Where are the configuration files for the FastAPI server located?

Server-specific configuration is centralized in [`server/server_config/settings.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/settings.py) for Uvicorn options and logging, while database connection strings are managed in [`server/server_config/database.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/server_config/database.py). Both modules read from the root `.env` file, which you create by copying `.env.example` during the initial setup phase.