How to Configure the GraphRAG Agent System: Complete Setup Guide

To configure the GraphRAG Agent system, clone the repository, install dependencies via requirements.txt, configure Neo4j and LLM credentials in .env, build the knowledge graph using graphrag_agent/integrations/build/main.py, and launch the FastAPI backend and Streamlit frontend.

The GraphRAG Agent is a modular Python monorepo that combines document ingestion, Neo4j knowledge graphs, and LLM-powered retrieval-augmented generation. This guide walks you through how to configure the GraphRAG Agent system from source, covering environment setup, dependency installation, and service orchestration.

Repository Architecture

The system is organized into four logical layers that handle distinct responsibilities:

Layer Purpose Key Files
Core runtime RAG pipelines, search tools, and graph-building logic graphrag_agent/search/tool/base.py, graphrag_agent/pipelines/ingestion/document_processor.py, graphrag_agent/graph/structure/struct_builder.py
Backend API FastAPI service exposing chat, source retrieval, and feedback endpoints server/main.py, server/routers/chat.py, server/server_config/settings.py
Frontend UI Streamlit application communicating with the FastAPI backend frontend/app.py, frontend/utils/api.py
Build & Index Scripts for document ingestion, chunking, embedding, and Neo4j graph creation graphrag_agent/integrations/build/main.py, graphrag_agent/integrations/build/build_graph.py

Step 1: Environment Setup

Begin by cloning the repository and creating an isolated Python environment.

git clone https://github.com/1517005260/graph-rag-agent.git
cd graph-rag-agent

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

System Dependencies

For PDF and document processing, install OS-level packages. On Ubuntu:

sudo apt-get update
sudo apt-get install -y python-dev-is-python3 libxml2-dev libxslt1-dev antiword unrtf poppler-utils

On Windows, if you encounter Torch DLL errors, downgrade PyTorch:

pip install torch==2.8.0

Step 2: Install Python Dependencies

Install the required packages listed in requirements.txt:

pip install -r requirements.txt

This installs FastAPI, Streamlit, the Neo4j driver, LLM SDKs, and document processing libraries.

Step 3: Configure Environment Variables

Copy the example environment file and populate it with your credentials:

cp .env.example .env

Edit .env to include:

Variable Description
NEO4J_URI Bolt URL (e.g., bolt://localhost:7687)
NEO4J_USER / NEO4J_PASSWORD Neo4j authentication credentials
OPENAI_API_KEY API key for your LLM provider (OpenAI, Azure, etc.)
CACHE_ROOT Directory for vector stores and intermediate caches

The application loads these values via server/server_config/database.py for the backend and graphrag_agent/config/settings.py for the core runtime.

Step 4: Build the Knowledge Graph

Run the ingestion pipeline to process documents and populate Neo4j:

python -m graphrag_agent.integrations.build.main --data-dir ./files

This command executes the KnowledgeGraphProcessor class, which orchestrates:

  1. File reading via FileReader (file_reader.py) handling PDFs, DOCX, and TXT
  2. Chunking via ChineseTextChunker or TextChunker to split documents
  3. Embedding using the LLM embedder defined in settings.py
  4. Graph construction via KnowledgeGraphBuilder (build_graph.py), with batched Cypher execution in neo4j_batch.py

Incremental Updates

For adding new documents without rebuilding the entire graph:

python -m graphrag_agent.integrations.build.incremental_update --watch ./files

The IncrementalUpdateScheduler monitors the directory and triggers IncrementalGraphUpdater automatically.

Step 5: Launch the Services

Start the FastAPI backend:

uvicorn server.main:app --host 0.0.0.0 --port 8000

This exposes endpoints defined in server/routers/chat.py (chat), server/routers/source.py (source retrieval), and feedback collection.

In a separate terminal, launch the Streamlit frontend:

streamlit run frontend/app.py

The UI communicates with the backend via frontend/utils/api.py. Access the application at http://localhost:8501.

Testing and Validation

Verify the installation by running the unit test suite:

python -m unittest discover test -v

Tests cover the cache system, search tools in graphrag_agent/search/tool/base.py, and end-to-end multi-agent flows in test/hotpot_multi_agent_eval.py.

Minimal API Client Example

import requests

BASE = "http://localhost:8000"

def ask(question: str):
    resp = requests.post(f"{BASE}/chat/", json={"question": question})
    return resp.json()

print(ask("What is the policy for student scholarships?"))

Summary

  • The GraphRAG Agent system is a four-layer monorepo consisting of core runtime, FastAPI backend, Streamlit frontend, and build/index scripts.
  • Configuration centers on the .env file, which supplies Neo4j credentials and LLM API keys read by server/server_config/database.py and graphrag_agent/config/settings.py.
  • First-time setup requires running python -m graphrag_agent.integrations.build.main to ingest documents into Neo4j via the KnowledgeGraphBuilder.
  • Services are started with uvicorn server.main:app for the backend and streamlit run frontend/app.py for the UI.

Frequently Asked Questions

What are the minimum hardware requirements for running the GraphRAG Agent?

The system requires sufficient RAM to hold document embeddings during the build phase and a running Neo4j instance (local or remote). For small datasets, 8GB RAM is sufficient, but production deployments with large document corpora benefit from 16GB or more to accommodate the KnowledgeGraphProcessor and vector caches defined in graphrag_agent/config/settings.py.

Can I use a different vector database instead of Neo4j?

The current implementation in graphrag_agent/graph/structure/struct_builder.py and graphrag_agent/integrations/build/build_graph.py is optimized for Neo4j as the primary graph store. While the graphrag_agent/search/tool/base.py abstracts search functionality, replacing Neo4j would require modifying the Cypher generation logic in neo4j_batch.py and the connection handling in server/server_config/database.py.

How do I update the knowledge graph when new documents arrive?

Instead of rebuilding the entire graph, use the incremental update scheduler. Run python -m graphrag_agent.integrations.build.incremental_update --watch ./files to spawn the IncrementalUpdateScheduler, which monitors the directory and triggers IncrementalGraphUpdater to process new files without disrupting existing graph data.

Where are the configuration files for the FastAPI server located?

Server-specific configuration is centralized in server/server_config/settings.py for Uvicorn options and logging, while database connection strings are managed in server/server_config/database.py. Both modules read from the root .env file, which you create by copying .env.example during the initial setup phase.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →