How Knowledge Graphs Are Built in the Graph‑RAG‑Agent Project
The graph‑rag‑agent constructs knowledge graphs by parsing user messages to extract entity IDs, querying Neo4j for connected subgraphs, and materializing dynamic NetworkX graphs that feed directly into LLM reasoning chains.
The graph‑rag‑agent repository implements a hybrid architecture that combines persistent Neo4j storage with on‑the‑fly graph assembly. By transforming raw text into structured nodes and edges, the system enables retrieval‑augmented generation (RAG) workflows where language models reason over explicit relational data rather than flat text chunks.
Extracting Entity IDs from User Input
The build process begins when the system receives a user message containing explicit references to entities, relationships, or document chunks.
Parsing Messages with extract_kg_from_message
Located in server/services/kg_service.py, the extract_kg_from_message function uses regular‑expression patterns to locate structured sections such as Entities: [...], Relationships: [...], and Chunks: [...]. After stripping markdown code blocks, it compiles three distinct ID lists and forwards them to the graph assembler.
from server.services.kg_service import extract_kg_from_message
msg = """
Entities: [123, "abc"]
Relationships: ["rel1"]
Chunks: ["chunk_01"]
"""
graph = extract_kg_from_message(msg)
print(graph["nodes"], graph["links"])
When the message references a reference payload, the function augments the extracted IDs with additional context before initiating the graph retrieval pipeline.
Retrieving and Assembling the Graph from Neo4j
Once IDs are collected, the system validates their existence and executes a single‑pass Cypher query to assemble the neighborhood graph.
Validating IDs with check_entity_existence
Before constructing the graph, check_entity_existence runs a defensive Neo4j query that attempts multiple type casts (int, string) to confirm that the provided entity IDs actually exist in the database. This prevents hallucinated nodes from entering the reasoning chain.
Building the Cypher Query for 1‑Hop Neighbors
The get_knowledge_graph_for_ids function in server/services/kg_service.py constructs a Cypher query that retrieves:
- The seed entities specified by the user
- All direct relationships between any pair of seed entities
- One‑hop neighbors outside the seed set
- Deduplicated edges based on
source_target_type
The query returns a JSON‑style structure containing nodes (with id, label, description, and group) and links (with source, target, label, and weight).
Handling Chunk‑Only Inputs
If the input contains only chunk IDs without explicit entity references, the system falls back to get_graph_from_chunks. This helper extracts linked entities from the chunks first, then invokes the standard graph pipeline to ensure the final output maintains the same node‑link format.
Constructing Dynamic In‑Memory Graphs with NetworkX
For scenarios requiring recursive exploration beyond the initial 1‑hop limit, the repository provides a dynamic builder that materializes subgraphs as NetworkX DiGraph objects.
Initializing the DynamicKnowledgeGraphBuilder
Defined in graphrag_agent/search/tool/reasoning/kg_builder.py, the DynamicKnowledgeGraphBuilder class accepts a Neo4j driver instance and exposes the build_query_graph method. This method accepts a query string, a list of seed entities, and a configurable recursion depth.
from graphrag_agent.search.tool.reasoning.kg_builder import DynamicKnowledgeGraphBuilder
from server_config.database import get_db_manager
db = get_db_manager().driver
builder = DynamicKnowledgeGraphBuilder(graph=db)
subgraph = builder.build_query_graph(
query="Why did the 2023 policy change affect student scholarships?",
entities=["entity_42", "entity_87"],
depth=2,
)
print(subgraph.number_of_nodes(), subgraph.number_of_edges())
Recursive Expansion via _explore_graph
The builder initializes a fresh nx.DiGraph, seeds it with the starting entities, then calls _explore_graph recursively. Each iteration executes a Cypher MATCH (e1)-[r]->(e2) query for the current entity frontier, adds the discovered nodes and edges to the NetworkX graph, and collects new entity IDs for the next depth level. The process continues until the specified depth is reached or no new entities are found.
Metadata such as build_time, entity_count, and relation_count is attached to subgraph.graph for downstream inspection.
Feeding Graphs to the LLM Agent
The final stage integrates the constructed graph into the agent’s reasoning workflow.
Orchestrating Retrieval with GraphAgent
The GraphAgent class in graphrag_agent/agents/graph_agent.py orchestrates local and global search tools to retrieve either the static JSON graph or the dynamic NetworkX subgraph. It then injects this structured data into prompt templates (defined in graphrag_agent/config/prompts.py), allowing the LLM to reference specific nodes, edges, and properties when generating answers.
from graphrag_agent.agents.graph_agent import GraphAgent
agent = GraphAgent()
result = agent.run(
{"messages": [{"role": "user", "content": "Explain the relationship between X and Y"}]}
)
print(result["messages"][-1].content)
By passing the graph structure directly into the prompt, the system grounds the model’s responses in the verified relational data extracted from Neo4j.
Summary
- ID Extraction: The
extract_kg_from_messagefunction inserver/services/kg_service.pyparses user messages to collect entity, relationship, and chunk IDs using regex patterns. - Neo4j Assembly: The
get_knowledge_graph_for_idsfunction validates IDs and executes a single‑pass Cypher query to fetch 1‑hop neighborhoods, returning a JSON graph structure. - Dynamic Construction: The
DynamicKnowledgeGraphBuilderingraphrag_agent/search/tool/reasoning/kg_builder.pycreates recursive NetworkX subgraphs for deep exploration. - Agent Integration: The
GraphAgentconsumes these graphs to provide structured context for LLM reasoning, ensuring answers are grounded in the knowledge base.
Frequently Asked Questions
How does the system handle invalid entity IDs?
Before executing the main graph query, the check_entity_existence function attempts multiple type casts against Neo4j to verify that each ID exists in the database. Invalid IDs are filtered out, preventing the construction of spurious graph nodes.
What is the difference between the static JSON graph and the dynamic NetworkX graph?
The static JSON graph returned by get_knowledge_graph_for_ids provides a 1‑hop neighborhood snapshot suitable for immediate LLM consumption. The dynamic NetworkX graph built by DynamicKnowledgeGraphBuilder supports recursive expansion up to a configurable depth, enabling multi‑hop reasoning for complex queries.
Can the graph builder work with only document chunks instead of explicit entity IDs?
Yes. When only chunk IDs are provided, the system invokes get_graph_from_chunks to first extract linked entities from those chunks, then proceeds with the standard graph assembly pipeline. This ensures the knowledge graph can be built even from unstructured document references.
Which file configures the Neo4j connection used by the knowledge graph services?
The server_config/database.py file provides the get_db_manager function, which supplies the Neo4j driver instance used by both kg_service.py and kg_builder.py for all database operations.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →