How to Set Up the GraphRAG Agent Project: Complete Installation Guide
To set up the GraphRAG Agent project, clone the repository, install Python dependencies and system libraries, configure your Neo4j and LLM credentials in a .env file, build the initial knowledge graph from your documents, and launch both the FastAPI backend and Streamlit frontend services.
The GraphRAG Agent is a modular Python monorepo that combines retrieval-augmented generation (RAG) pipelines with Neo4j graph databases. This guide walks through deploying the full stack—from ingesting documents into the knowledge graph to running the conversational web interface—using the exact file structure and commands found in the 1517005260/graph-rag-agent repository.
Prerequisites
Before installation, ensure you have Python 3.8+, a running Neo4j 5.x instance, and OS-level dependencies for document processing. On Ubuntu, install the required system packages:
sudo apt-get install python-dev-is-python3 libxml2-dev libxslt1-dev antiword unrtf poppler-utils
These libraries enable PDF and DOCX parsing used by the FileReader class in graphrag_agent/pipelines/ingestion/document_processor.py.
Step-by-Step Installation
1. Clone the Repository and Create a Virtual Environment
git clone https://github.com/1517005260/graph-rag-agent.git
cd graph-rag-agent
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
2. Install Python Dependencies
Install the packages listed in requirements.txt, which includes FastAPI, Streamlit, the Neo4j Python driver, and LLM SDKs:
pip install -r requirements.txt
Windows users: If you encounter Torch DLL errors, downgrade PyTorch as specified in the documentation:
pip install torch==2.8.0
3. Configure Environment Variables
Copy the example environment file and edit it with your Neo4j connection details and LLM API keys:
cp .env.example .env
Open .env and set these critical variables:
NEO4J_URI: Bolt URL (e.g.,bolt://localhost:7687)NEO4J_USERandNEO4J_PASSWORD: Database authenticationOPENAI_API_KEY: Your LLM provider tokenCACHE_ROOT: Directory for vector stores and intermediate caches
The application loads these settings through server/server_config/database.py and graphrag_agent/config/settings.py at runtime.
Build the Knowledge Graph
The GraphRAG Agent requires an initial ingestion of documents to create the Neo4j knowledge graph. The build pipeline orchestrates four stages defined in graphrag_agent/integrations/build/main.py:
- File reading via
FileReader(handles PDFs, DOCX, TXT) - Chunking via
ChineseTextChunkerorTextChunker - Embedding using the configured LLM embedder
- Neo4j import via
KnowledgeGraphBuilder(batched Cypher inneo4j_batch.py)
Run the full pipeline:
python -m graphrag_agent.integrations.build.main --data-dir ./files
Replace ./files with the path to your source documents. This command uses the KnowledgeGraphProcessor class to populate your graph database with entities and relationships.
Start the Application Services
Launch the FastAPI Backend
The backend service exposes REST endpoints for chat, source retrieval, and feedback under server/routers/chat.py and server/routers/source.py. Start the Uvicorn server defined in server/main.py:
uvicorn server.main:app --host 0.0.0.0 --port 8000
All configurable options (host, port, logging) are centralized in server/server_config/settings.py.
Launch the Streamlit Frontend
In a new terminal window (with the virtual environment activated), start the UI:
streamlit run frontend/app.py
The frontend communicates with the backend via helper functions in frontend/utils/api.py. Open the displayed URL (typically http://localhost:8501) to interact with the RAG agent.
Verify the Installation
Run the unit test suite to validate the cache system, search tools, and multi-agent flows:
python -m unittest discover test -v
For end-to-end validation, execute the Hotpot evaluation script at test/hotpot_multi_agent_eval.py, which spins up the complete stack.
Incremental Document Updates
After initial setup, add new documents without rebuilding the entire graph using the incremental updater:
python -m graphrag_agent.integrations.build.incremental_update --watch ./files
This command spawns the IncrementalUpdateScheduler background thread, which monitors the directory for changes and triggers IncrementalGraphUpdater to merge new entities into Neo4j.
Summary
- GraphRAG Agent is organized into four layers: Core runtime (
graphrag_agent/), Backend API (server/), Frontend UI (frontend/), and Build scripts (graphrag_agent/integrations/build/). - System dependencies (Poppler, XML libraries) are required for document parsing on Linux systems.
- Configuration is managed through environment variables read by
server/server_config/settings.pyand database modules. - Initial ingestion uses
graphrag_agent.integrations.build.mainto process files through chunking, embedding, and graph construction. - Services must run simultaneously: FastAPI backend (
uvicorn server.main:app) and Streamlit frontend (streamlit run frontend/app.py).
Frequently Asked Questions
What are the system requirements for GraphRAG Agent?
You need Python 3.8 or higher, a Neo4j 5.x database instance with the APOC plugin, and approximately 4GB of RAM for the embedding models. The OS requires poppler-utils and libxml2-dev for PDF processing. GPU acceleration is optional but recommended for faster embedding generation when processing large document collections.
How do I fix the Torch DLL error on Windows?
If you encounter DLL load failures when importing PyTorch modules, downgrade to the stable 2.8.0 release by running pip install torch==2.8.0. This resolves compatibility issues with specific Windows Visual C++ redistributable versions that conflict with the default torch installation in requirements.txt.
Can I run GraphRAG Agent without Neo4j?
No. The KnowledgeGraphBuilder class in graphrag_agent/integrations/build/build_graph.py and the search tools in graphrag_agent/search/tool/base.py depend on Neo4j as the primary graph store. The application uses Cypher queries to traverse entity relationships during the retrieval phase of the RAG pipeline.
How do I add new documents after the initial setup?
Use the incremental update command python -m graphrag_agent.integrations.build.incremental_update --watch ./files instead of rebuilding from scratch. The IncrementalGraphUpdater class processes only new or modified files, updating the Neo4j graph and vector embeddings without reprocessing your entire document corpus.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →