how-to-guide

How Knowledge Graphs Are Built in the Graph‑RAG‑Agent Project

February 22, 2026 1517005260/graph-rag-agent ↗

The graph‑rag‑agent constructs knowledge graphs by parsing user messages to extract entity IDs, querying Neo4j for connected subgraphs, and materializing dynamic NetworkX graphs that feed directly into LLM reasoning chains.

The graph‑rag‑agent repository implements a hybrid architecture that combines persistent Neo4j storage with on‑the‑fly graph assembly. By transforming raw text into structured nodes and edges, the system enables retrieval‑augmented generation (RAG) workflows where language models reason over explicit relational data rather than flat text chunks.

Extracting Entity IDs from User Input

The build process begins when the system receives a user message containing explicit references to entities, relationships, or document chunks.

Parsing Messages with `extract_kg_from_message`

Located in server/services/kg_service.py, the extract_kg_from_message function uses regular‑expression patterns to locate structured sections such as Entities: [...], Relationships: [...], and Chunks: [...]. After stripping markdown code blocks, it compiles three distinct ID lists and forwards them to the graph assembler.

from server.services.kg_service import extract_kg_from_message

msg = """
Entities: [123, "abc"]
Relationships: ["rel1"]
Chunks: ["chunk_01"]
"""
graph = extract_kg_from_message(msg)
print(graph["nodes"], graph["links"])

When the message references a reference payload, the function augments the extracted IDs with additional context before initiating the graph retrieval pipeline.

Retrieving and Assembling the Graph from Neo4j

Once IDs are collected, the system validates their existence and executes a single‑pass Cypher query to assemble the neighborhood graph.

Validating IDs with `check_entity_existence`

Before constructing the graph, check_entity_existence runs a defensive Neo4j query that attempts multiple type casts (int, string) to confirm that the provided entity IDs actually exist in the database. This prevents hallucinated nodes from entering the reasoning chain.

Building the Cypher Query for 1‑Hop Neighbors

The get_knowledge_graph_for_ids function in server/services/kg_service.py constructs a Cypher query that retrieves:

The seed entities specified by the user
All direct relationships between any pair of seed entities
One‑hop neighbors outside the seed set
Deduplicated edges based on source_target_type

The query returns a JSON‑style structure containing nodes (with id, label, description, and group) and links (with source, target, label, and weight).

Handling Chunk‑Only Inputs

If the input contains only chunk IDs without explicit entity references, the system falls back to get_graph_from_chunks. This helper extracts linked entities from the chunks first, then invokes the standard graph pipeline to ensure the final output maintains the same node‑link format.

Constructing Dynamic In‑Memory Graphs with NetworkX

For scenarios requiring recursive exploration beyond the initial 1‑hop limit, the repository provides a dynamic builder that materializes subgraphs as NetworkX DiGraph objects.

Initializing the `DynamicKnowledgeGraphBuilder`

Defined in graphrag_agent/search/tool/reasoning/kg_builder.py, the DynamicKnowledgeGraphBuilder class accepts a Neo4j driver instance and exposes the build_query_graph method. This method accepts a query string, a list of seed entities, and a configurable recursion depth.

from graphrag_agent.search.tool.reasoning.kg_builder import DynamicKnowledgeGraphBuilder
from server_config.database import get_db_manager

db = get_db_manager().driver
builder = DynamicKnowledgeGraphBuilder(graph=db)

subgraph = builder.build_query_graph(
    query="Why did the 2023 policy change affect student scholarships?",
    entities=["entity_42", "entity_87"],
    depth=2,
)
print(subgraph.number_of_nodes(), subgraph.number_of_edges())

Recursive Expansion via `_explore_graph`

The builder initializes a fresh nx.DiGraph, seeds it with the starting entities, then calls _explore_graph recursively. Each iteration executes a Cypher MATCH (e1)-[r]->(e2) query for the current entity frontier, adds the discovered nodes and edges to the NetworkX graph, and collects new entity IDs for the next depth level. The process continues until the specified depth is reached or no new entities are found.

Metadata such as build_time, entity_count, and relation_count is attached to subgraph.graph for downstream inspection.

Feeding Graphs to the LLM Agent

The final stage integrates the constructed graph into the agent’s reasoning workflow.

Orchestrating Retrieval with `GraphAgent`

The GraphAgent class in graphrag_agent/agents/graph_agent.py orchestrates local and global search tools to retrieve either the static JSON graph or the dynamic NetworkX subgraph. It then injects this structured data into prompt templates (defined in graphrag_agent/config/prompts.py), allowing the LLM to reference specific nodes, edges, and properties when generating answers.

from graphrag_agent.agents.graph_agent import GraphAgent

agent = GraphAgent()
result = agent.run(
    {"messages": [{"role": "user", "content": "Explain the relationship between X and Y"}]}
)
print(result["messages"][-1].content)

By passing the graph structure directly into the prompt, the system grounds the model’s responses in the verified relational data extracted from Neo4j.

Summary

ID Extraction: The extract_kg_from_message function in server/services/kg_service.py parses user messages to collect entity, relationship, and chunk IDs using regex patterns.
Neo4j Assembly: The get_knowledge_graph_for_ids function validates IDs and executes a single‑pass Cypher query to fetch 1‑hop neighborhoods, returning a JSON graph structure.
Dynamic Construction: The DynamicKnowledgeGraphBuilder in graphrag_agent/search/tool/reasoning/kg_builder.py creates recursive NetworkX subgraphs for deep exploration.
Agent Integration: The GraphAgent consumes these graphs to provide structured context for LLM reasoning, ensuring answers are grounded in the knowledge base.

Frequently Asked Questions

How does the system handle invalid entity IDs?

Before executing the main graph query, the check_entity_existence function attempts multiple type casts against Neo4j to verify that each ID exists in the database. Invalid IDs are filtered out, preventing the construction of spurious graph nodes.

What is the difference between the static JSON graph and the dynamic NetworkX graph?

The static JSON graph returned by get_knowledge_graph_for_ids provides a 1‑hop neighborhood snapshot suitable for immediate LLM consumption. The dynamic NetworkX graph built by DynamicKnowledgeGraphBuilder supports recursive expansion up to a configurable depth, enabling multi‑hop reasoning for complex queries.

Can the graph builder work with only document chunks instead of explicit entity IDs?

Yes. When only chunk IDs are provided, the system invokes get_graph_from_chunks to first extract linked entities from those chunks, then proceeds with the standard graph assembly pipeline. This ensures the knowledge graph can be built even from unstructured document references.

Which file configures the Neo4j connection used by the knowledge graph services?

The server_config/database.py file provides the get_db_manager function, which supplies the Neo4j driver instance used by both kg_service.py and kg_builder.py for all database operations.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how 1517005260/graph-rag-agent works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →