How to Configure the GraphRAG Agent System: Complete Setup Guide
To configure the GraphRAG Agent system, clone the repository, install dependencies via requirements.txt, configure Neo4j and LLM credentials in .env, build the knowledge graph using graphrag_agent/integrations/build/main.py, and launch the FastAPI backend and Streamlit frontend.
The GraphRAG Agent is a modular Python monorepo that combines document ingestion, Neo4j knowledge graphs, and LLM-powered retrieval-augmented generation. This guide walks you through how to configure the GraphRAG Agent system from source, covering environment setup, dependency installation, and service orchestration.
Repository Architecture
The system is organized into four logical layers that handle distinct responsibilities:
| Layer | Purpose | Key Files |
|---|---|---|
| Core runtime | RAG pipelines, search tools, and graph-building logic | graphrag_agent/search/tool/base.py, graphrag_agent/pipelines/ingestion/document_processor.py, graphrag_agent/graph/structure/struct_builder.py |
| Backend API | FastAPI service exposing chat, source retrieval, and feedback endpoints | server/main.py, server/routers/chat.py, server/server_config/settings.py |
| Frontend UI | Streamlit application communicating with the FastAPI backend | frontend/app.py, frontend/utils/api.py |
| Build & Index | Scripts for document ingestion, chunking, embedding, and Neo4j graph creation | graphrag_agent/integrations/build/main.py, graphrag_agent/integrations/build/build_graph.py |
Step 1: Environment Setup
Begin by cloning the repository and creating an isolated Python environment.
git clone https://github.com/1517005260/graph-rag-agent.git
cd graph-rag-agent
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
System Dependencies
For PDF and document processing, install OS-level packages. On Ubuntu:
sudo apt-get update
sudo apt-get install -y python-dev-is-python3 libxml2-dev libxslt1-dev antiword unrtf poppler-utils
On Windows, if you encounter Torch DLL errors, downgrade PyTorch:
pip install torch==2.8.0
Step 2: Install Python Dependencies
Install the required packages listed in requirements.txt:
pip install -r requirements.txt
This installs FastAPI, Streamlit, the Neo4j driver, LLM SDKs, and document processing libraries.
Step 3: Configure Environment Variables
Copy the example environment file and populate it with your credentials:
cp .env.example .env
Edit .env to include:
| Variable | Description |
|---|---|
NEO4J_URI |
Bolt URL (e.g., bolt://localhost:7687) |
NEO4J_USER / NEO4J_PASSWORD |
Neo4j authentication credentials |
OPENAI_API_KEY |
API key for your LLM provider (OpenAI, Azure, etc.) |
CACHE_ROOT |
Directory for vector stores and intermediate caches |
The application loads these values via server/server_config/database.py for the backend and graphrag_agent/config/settings.py for the core runtime.
Step 4: Build the Knowledge Graph
Run the ingestion pipeline to process documents and populate Neo4j:
python -m graphrag_agent.integrations.build.main --data-dir ./files
This command executes the KnowledgeGraphProcessor class, which orchestrates:
- File reading via
FileReader(file_reader.py) handling PDFs, DOCX, and TXT - Chunking via
ChineseTextChunkerorTextChunkerto split documents - Embedding using the LLM embedder defined in
settings.py - Graph construction via
KnowledgeGraphBuilder(build_graph.py), with batched Cypher execution inneo4j_batch.py
Incremental Updates
For adding new documents without rebuilding the entire graph:
python -m graphrag_agent.integrations.build.incremental_update --watch ./files
The IncrementalUpdateScheduler monitors the directory and triggers IncrementalGraphUpdater automatically.
Step 5: Launch the Services
Start the FastAPI backend:
uvicorn server.main:app --host 0.0.0.0 --port 8000
This exposes endpoints defined in server/routers/chat.py (chat), server/routers/source.py (source retrieval), and feedback collection.
In a separate terminal, launch the Streamlit frontend:
streamlit run frontend/app.py
The UI communicates with the backend via frontend/utils/api.py. Access the application at http://localhost:8501.
Testing and Validation
Verify the installation by running the unit test suite:
python -m unittest discover test -v
Tests cover the cache system, search tools in graphrag_agent/search/tool/base.py, and end-to-end multi-agent flows in test/hotpot_multi_agent_eval.py.
Minimal API Client Example
import requests
BASE = "http://localhost:8000"
def ask(question: str):
resp = requests.post(f"{BASE}/chat/", json={"question": question})
return resp.json()
print(ask("What is the policy for student scholarships?"))
Summary
- The GraphRAG Agent system is a four-layer monorepo consisting of core runtime, FastAPI backend, Streamlit frontend, and build/index scripts.
- Configuration centers on the
.envfile, which supplies Neo4j credentials and LLM API keys read byserver/server_config/database.pyandgraphrag_agent/config/settings.py. - First-time setup requires running
python -m graphrag_agent.integrations.build.mainto ingest documents into Neo4j via theKnowledgeGraphBuilder. - Services are started with
uvicorn server.main:appfor the backend andstreamlit run frontend/app.pyfor the UI.
Frequently Asked Questions
What are the minimum hardware requirements for running the GraphRAG Agent?
The system requires sufficient RAM to hold document embeddings during the build phase and a running Neo4j instance (local or remote). For small datasets, 8GB RAM is sufficient, but production deployments with large document corpora benefit from 16GB or more to accommodate the KnowledgeGraphProcessor and vector caches defined in graphrag_agent/config/settings.py.
Can I use a different vector database instead of Neo4j?
The current implementation in graphrag_agent/graph/structure/struct_builder.py and graphrag_agent/integrations/build/build_graph.py is optimized for Neo4j as the primary graph store. While the graphrag_agent/search/tool/base.py abstracts search functionality, replacing Neo4j would require modifying the Cypher generation logic in neo4j_batch.py and the connection handling in server/server_config/database.py.
How do I update the knowledge graph when new documents arrive?
Instead of rebuilding the entire graph, use the incremental update scheduler. Run python -m graphrag_agent.integrations.build.incremental_update --watch ./files to spawn the IncrementalUpdateScheduler, which monitors the directory and triggers IncrementalGraphUpdater to process new files without disrupting existing graph data.
Where are the configuration files for the FastAPI server located?
Server-specific configuration is centralized in server/server_config/settings.py for Uvicorn options and logging, while database connection strings are managed in server/server_config/database.py. Both modules read from the root .env file, which you create by copying .env.example during the initial setup phase.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →