# How Knowledge Graphs Are Built in the Graph‑RAG‑Agent Project

> Discover how graph-rag-agent builds knowledge graphs by parsing messages, querying Neo4j, and creating dynamic graphs for LLM reasoning. Learn about entity ID extraction and subgraph materialization.

- Repository: [GLK/graph-rag-agent](https://github.com/1517005260/graph-rag-agent)
- Tags: how-to-guide
- Published: 2026-02-22

---

**The graph‑rag‑agent constructs knowledge graphs by parsing user messages to extract entity IDs, querying Neo4j for connected subgraphs, and materializing dynamic NetworkX graphs that feed directly into LLM reasoning chains.**

The **graph‑rag‑agent** repository implements a hybrid architecture that combines persistent Neo4j storage with on‑the‑fly graph assembly. By transforming raw text into structured nodes and edges, the system enables retrieval‑augmented generation (RAG) workflows where language models reason over explicit relational data rather than flat text chunks.

## Extracting Entity IDs from User Input

The build process begins when the system receives a user message containing explicit references to entities, relationships, or document chunks.

### Parsing Messages with `extract_kg_from_message`

Located in [`server/services/kg_service.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/services/kg_service.py), the `extract_kg_from_message` function uses regular‑expression patterns to locate structured sections such as `Entities: [...]`, `Relationships: [...]`, and `Chunks: [...]`. After stripping markdown code blocks, it compiles three distinct ID lists and forwards them to the graph assembler.

```python
from server.services.kg_service import extract_kg_from_message

msg = """
Entities: [123, "abc"]
Relationships: ["rel1"]
Chunks: ["chunk_01"]
"""
graph = extract_kg_from_message(msg)
print(graph["nodes"], graph["links"])

```

When the message references a `reference` payload, the function augments the extracted IDs with additional context before initiating the graph retrieval pipeline.

## Retrieving and Assembling the Graph from Neo4j

Once IDs are collected, the system validates their existence and executes a single‑pass Cypher query to assemble the neighborhood graph.

### Validating IDs with `check_entity_existence`

Before constructing the graph, `check_entity_existence` runs a defensive Neo4j query that attempts multiple type casts (`int`, `string`) to confirm that the provided entity IDs actually exist in the database. This prevents hallucinated nodes from entering the reasoning chain.

### Building the Cypher Query for 1‑Hop Neighbors

The `get_knowledge_graph_for_ids` function in [`server/services/kg_service.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/services/kg_service.py) constructs a Cypher query that retrieves:
- The seed entities specified by the user
- All direct relationships between any pair of seed entities
- One‑hop neighbors outside the seed set
- Deduplicated edges based on `source_target_type`

The query returns a JSON‑style structure containing `nodes` (with `id`, `label`, `description`, and `group`) and `links` (with `source`, `target`, `label`, and `weight`).

### Handling Chunk‑Only Inputs

If the input contains only chunk IDs without explicit entity references, the system falls back to `get_graph_from_chunks`. This helper extracts linked entities from the chunks first, then invokes the standard graph pipeline to ensure the final output maintains the same node‑link format.

## Constructing Dynamic In‑Memory Graphs with NetworkX

For scenarios requiring recursive exploration beyond the initial 1‑hop limit, the repository provides a dynamic builder that materializes subgraphs as **NetworkX** `DiGraph` objects.

### Initializing the `DynamicKnowledgeGraphBuilder`

Defined in [`graphrag_agent/search/tool/reasoning/kg_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/search/tool/reasoning/kg_builder.py), the `DynamicKnowledgeGraphBuilder` class accepts a Neo4j driver instance and exposes the `build_query_graph` method. This method accepts a query string, a list of seed entities, and a configurable recursion depth.

```python
from graphrag_agent.search.tool.reasoning.kg_builder import DynamicKnowledgeGraphBuilder
from server_config.database import get_db_manager

db = get_db_manager().driver
builder = DynamicKnowledgeGraphBuilder(graph=db)

subgraph = builder.build_query_graph(
    query="Why did the 2023 policy change affect student scholarships?",
    entities=["entity_42", "entity_87"],
    depth=2,
)
print(subgraph.number_of_nodes(), subgraph.number_of_edges())

```

### Recursive Expansion via `_explore_graph`

The builder initializes a fresh `nx.DiGraph`, seeds it with the starting entities, then calls `_explore_graph` recursively. Each iteration executes a Cypher `MATCH (e1)-[r]->(e2)` query for the current entity frontier, adds the discovered nodes and edges to the NetworkX graph, and collects new entity IDs for the next depth level. The process continues until the specified depth is reached or no new entities are found.

Metadata such as `build_time`, `entity_count`, and `relation_count` is attached to `subgraph.graph` for downstream inspection.

## Feeding Graphs to the LLM Agent

The final stage integrates the constructed graph into the agent’s reasoning workflow.

### Orchestrating Retrieval with `GraphAgent`

The `GraphAgent` class in [`graphrag_agent/agents/graph_agent.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/agents/graph_agent.py) orchestrates local and global search tools to retrieve either the static JSON graph or the dynamic NetworkX subgraph. It then injects this structured data into prompt templates (defined in [`graphrag_agent/config/prompts.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/config/prompts.py)), allowing the LLM to reference specific nodes, edges, and properties when generating answers.

```python
from graphrag_agent.agents.graph_agent import GraphAgent

agent = GraphAgent()
result = agent.run(
    {"messages": [{"role": "user", "content": "Explain the relationship between X and Y"}]}
)
print(result["messages"][-1].content)

```

By passing the graph structure directly into the prompt, the system grounds the model’s responses in the verified relational data extracted from Neo4j.

## Summary

- **ID Extraction**: The `extract_kg_from_message` function in [`server/services/kg_service.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server/services/kg_service.py) parses user messages to collect entity, relationship, and chunk IDs using regex patterns.
- **Neo4j Assembly**: The `get_knowledge_graph_for_ids` function validates IDs and executes a single‑pass Cypher query to fetch 1‑hop neighborhoods, returning a JSON graph structure.
- **Dynamic Construction**: The `DynamicKnowledgeGraphBuilder` in [`graphrag_agent/search/tool/reasoning/kg_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/search/tool/reasoning/kg_builder.py) creates recursive NetworkX subgraphs for deep exploration.
- **Agent Integration**: The `GraphAgent` consumes these graphs to provide structured context for LLM reasoning, ensuring answers are grounded in the knowledge base.

## Frequently Asked Questions

### How does the system handle invalid entity IDs?

Before executing the main graph query, the `check_entity_existence` function attempts multiple type casts against Neo4j to verify that each ID exists in the database. Invalid IDs are filtered out, preventing the construction of spurious graph nodes.

### What is the difference between the static JSON graph and the dynamic NetworkX graph?

The static JSON graph returned by `get_knowledge_graph_for_ids` provides a 1‑hop neighborhood snapshot suitable for immediate LLM consumption. The dynamic NetworkX graph built by `DynamicKnowledgeGraphBuilder` supports recursive expansion up to a configurable depth, enabling multi‑hop reasoning for complex queries.

### Can the graph builder work with only document chunks instead of explicit entity IDs?

Yes. When only chunk IDs are provided, the system invokes `get_graph_from_chunks` to first extract linked entities from those chunks, then proceeds with the standard graph assembly pipeline. This ensures the knowledge graph can be built even from unstructured document references.

### Which file configures the Neo4j connection used by the knowledge graph services?

The [`server_config/database.py`](https://github.com/1517005260/graph-rag-agent/blob/main/server_config/database.py) file provides the `get_db_manager` function, which supplies the Neo4j driver instance used by both [`kg_service.py`](https://github.com/1517005260/graph-rag-agent/blob/main/kg_service.py) and [`kg_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/kg_builder.py) for all database operations.