getting-started

How to Set Up the GraphRAG Agent Project: Complete Installation Guide

February 22, 2026 1517005260/graph-rag-agent ↗

To set up the GraphRAG Agent project, clone the repository, install Python dependencies and system libraries, configure your Neo4j and LLM credentials in a .env file, build the initial knowledge graph from your documents, and launch both the FastAPI backend and Streamlit frontend services.

The GraphRAG Agent is a modular Python monorepo that combines retrieval-augmented generation (RAG) pipelines with Neo4j graph databases. This guide walks through deploying the full stack—from ingesting documents into the knowledge graph to running the conversational web interface—using the exact file structure and commands found in the 1517005260/graph-rag-agent repository.

Prerequisites

Before installation, ensure you have Python 3.8+, a running Neo4j 5.x instance, and OS-level dependencies for document processing. On Ubuntu, install the required system packages:

sudo apt-get install python-dev-is-python3 libxml2-dev libxslt1-dev antiword unrtf poppler-utils

These libraries enable PDF and DOCX parsing used by the FileReader class in graphrag_agent/pipelines/ingestion/document_processor.py.

Step-by-Step Installation

1. Clone the Repository and Create a Virtual Environment

git clone https://github.com/1517005260/graph-rag-agent.git
cd graph-rag-agent
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

2. Install Python Dependencies

Install the packages listed in requirements.txt, which includes FastAPI, Streamlit, the Neo4j Python driver, and LLM SDKs:

pip install -r requirements.txt

Windows users: If you encounter Torch DLL errors, downgrade PyTorch as specified in the documentation:

pip install torch==2.8.0

3. Configure Environment Variables

Copy the example environment file and edit it with your Neo4j connection details and LLM API keys:

cp .env.example .env

Open .env and set these critical variables:

NEO4J_URI: Bolt URL (e.g., bolt://localhost:7687)
NEO4J_USER and NEO4J_PASSWORD: Database authentication
OPENAI_API_KEY: Your LLM provider token
CACHE_ROOT: Directory for vector stores and intermediate caches

The application loads these settings through server/server_config/database.py and graphrag_agent/config/settings.py at runtime.

Build the Knowledge Graph

The GraphRAG Agent requires an initial ingestion of documents to create the Neo4j knowledge graph. The build pipeline orchestrates four stages defined in graphrag_agent/integrations/build/main.py:

File reading via FileReader (handles PDFs, DOCX, TXT)
Chunking via ChineseTextChunker or TextChunker
Embedding using the configured LLM embedder
Neo4j import via KnowledgeGraphBuilder (batched Cypher in neo4j_batch.py)

Run the full pipeline:

python -m graphrag_agent.integrations.build.main --data-dir ./files

Replace ./files with the path to your source documents. This command uses the KnowledgeGraphProcessor class to populate your graph database with entities and relationships.

Start the Application Services

Launch the FastAPI Backend

The backend service exposes REST endpoints for chat, source retrieval, and feedback under server/routers/chat.py and server/routers/source.py. Start the Uvicorn server defined in server/main.py:

uvicorn server.main:app --host 0.0.0.0 --port 8000

All configurable options (host, port, logging) are centralized in server/server_config/settings.py.

Launch the Streamlit Frontend

In a new terminal window (with the virtual environment activated), start the UI:

streamlit run frontend/app.py

The frontend communicates with the backend via helper functions in frontend/utils/api.py. Open the displayed URL (typically http://localhost:8501) to interact with the RAG agent.

Verify the Installation

Run the unit test suite to validate the cache system, search tools, and multi-agent flows:

python -m unittest discover test -v

For end-to-end validation, execute the Hotpot evaluation script at test/hotpot_multi_agent_eval.py, which spins up the complete stack.

Incremental Document Updates

After initial setup, add new documents without rebuilding the entire graph using the incremental updater:

python -m graphrag_agent.integrations.build.incremental_update --watch ./files

This command spawns the IncrementalUpdateScheduler background thread, which monitors the directory for changes and triggers IncrementalGraphUpdater to merge new entities into Neo4j.

Summary

GraphRAG Agent is organized into four layers: Core runtime (graphrag_agent/), Backend API (server/), Frontend UI (frontend/), and Build scripts (graphrag_agent/integrations/build/).
System dependencies (Poppler, XML libraries) are required for document parsing on Linux systems.
Configuration is managed through environment variables read by server/server_config/settings.py and database modules.
Initial ingestion uses graphrag_agent.integrations.build.main to process files through chunking, embedding, and graph construction.
Services must run simultaneously: FastAPI backend (uvicorn server.main:app) and Streamlit frontend (streamlit run frontend/app.py).

Frequently Asked Questions

What are the system requirements for GraphRAG Agent?

You need Python 3.8 or higher, a Neo4j 5.x database instance with the APOC plugin, and approximately 4GB of RAM for the embedding models. The OS requires poppler-utils and libxml2-dev for PDF processing. GPU acceleration is optional but recommended for faster embedding generation when processing large document collections.

How do I fix the Torch DLL error on Windows?

If you encounter DLL load failures when importing PyTorch modules, downgrade to the stable 2.8.0 release by running pip install torch==2.8.0. This resolves compatibility issues with specific Windows Visual C++ redistributable versions that conflict with the default torch installation in requirements.txt.

Can I run GraphRAG Agent without Neo4j?

No. The KnowledgeGraphBuilder class in graphrag_agent/integrations/build/build_graph.py and the search tools in graphrag_agent/search/tool/base.py depend on Neo4j as the primary graph store. The application uses Cypher queries to traverse entity relationships during the retrieval phase of the RAG pipeline.

How do I add new documents after the initial setup?

Use the incremental update command python -m graphrag_agent.integrations.build.incremental_update --watch ./files instead of rebuilding from scratch. The IncrementalGraphUpdater class processes only new or modified files, updating the Neo4j graph and vector embeddings without reprocessing your entire document corpus.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how 1517005260/graph-rag-agent works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →