# How the Incremental Update Mechanism for Knowledge Graphs Works in graph-rag-agent

> Learn how the incremental update mechanism for knowledge graphs in graph-rag-agent efficiently detects and applies changes to your Neo4j graph without full rebuilds.

- Repository: [GLK/graph-rag-agent](https://github.com/1517005260/graph-rag-agent)
- Tags: internals
- Published: 2026-02-23

---

**The incremental update mechanism in graph-rag-agent uses a file-watching pipeline that detects additions, modifications, and deletions in source documents, then translates those changes into targeted Neo4j graph mutations without rebuilding the entire index.**

The `1517005260/graph-rag-agent` repository implements a production-ready **incremental update mechanism for knowledge graphs** that synchronizes source document deltas directly into Neo4j. This architecture eliminates the computational cost of full re-indexing by processing only changed files through a modular pipeline of specialized coordinator classes.

## Architecture of the Incremental Update System

The system is organized around three primary classes that separate concerns between orchestration, execution, and scheduling:

- **IncrementalUpdateManager** – The high-level controller defined in [`graphrag_agent/integrations/build/incremental_update.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/incremental_update.py) (line 19) that wires all sub-services, exposes the public API, and manages configuration.
- **IncrementalGraphUpdater** – The core mutation engine in [`graphrag_agent/integrations/build/incremental_graph_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/incremental_graph_builder.py) (line 23) that translates file-system diffs into Cypher queries and embedding updates.
- **IncrementalUpdateScheduler** – The background timer in [`graphrag_agent/integrations/build/incremental/incremental_update_scheduler.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/incremental/incremental_update_scheduler.py) (line 12) that enables continuous monitoring mode.

## Detecting File System Changes

Change detection begins with the `IncrementalGraphUpdater` delegating to a `FileChangeManager` instance. The `detect_changes` method returns a categorized dictionary of file operations:

```python
def detect_changes(self) -> Dict[str, List[str]]:
    """Return a dict with keys 'added', 'modified', 'deleted'."""
    return self.file_manager.detect_changes()

```

*Source:* [`graphrag_agent/integrations/build/incremental_graph_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/incremental_graph_builder.py), line 91

The `IncrementalUpdateManager.detect_file_changes()` method (lines 69-80 in [`incremental_update.py`](https://github.com/1517005260/graph-rag-agent/blob/main/incremental_update.py)) forwards this call and logs the detected delta for observability. This separation allows the core updater to remain agnostic of the specific file-system implementation while the manager handles operational concerns.

## The Incremental Processing Pipeline

All graph mutations flow through `IncrementalGraphUpdater.process_incremental_update()` (line 886), which orchestrates eight distinct sub-operations. Each step has a dedicated helper method to maintain modularity:

### 1. Ingesting New Documents

The `process_new_files` method (line 100) extracts entities and text chunks from newly added source files, computes vector embeddings, and creates the initial node and edge structures in memory before committing to Neo4j.

### 2. Entity Deduplication and Integration

`integrate_new_entities` (line 315) performs batch creation of entity nodes while deduplicating against existing graph data via unique properties. This prevents duplicate nodes when modified files contain entities already present in the knowledge graph.

### 3. Relationship Construction

`integrate_new_relationships` (line 365) builds edges between newly created entities based on semantic relations extracted during document processing, linking the fresh sub-graph into the existing topology.

### 4. Graph Structure Merging

The `merge_graph_structures` method (line 455) resolves conflicts between the newly generated sub-graph and the persistent Neo4j store, handling property updates and node reconciliation through atomic Cypher merge operations.

### 5. Embedding Synchronization for Modifications

When documents are modified, `update_changed_file_embeddings` (line 526) identifies affected chunks, recomputes their vector representations, and replaces the obsolete embeddings in the vector store to maintain search accuracy.

### 6. Handling Deletions

`process_deleted_files` (line 572) identifies all entities, chunks, and relationships associated with removed source files and executes deletions to prevent stale data from persisting in the graph.

### 7. Backup and Export Operations

`export_graph_structure` and `import_graph_structure` (lines 677-733) provide serialization capabilities for creating point-in-time backups or migrating the updated graph to downstream systems.

### 8. Statistics and Reporting

Finally, `display_graph_statistics` (line 859) computes aggregate counts of nodes and edges, generating a concise summary dictionary that reports the scope of the incremental changes applied.

## Graph Consistency Validation

After processing deletions, the system validates structural integrity to prevent orphaned relationships. The `IncrementalUpdateManager` instantiates a `GraphConsistencyValidator` from [`graphrag_agent/graph/graph_consistency_validator.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/graph/graph_consistency_validator.py) and exposes it through `verify_graph_consistency()` (lines 173-186).

When the deletion count exceeds zero, the manager automatically triggers validation:

```python
if deleted_count > 0:
    self.verify_graph_consistency()

```

*Source:* [`graphrag_agent/integrations/build/incremental_update.py`](https://github.com/1517005260/graph-rag-agent/blob/main/graphrag_agent/integrations/build/incremental_update.py), lines 92-99

## Scheduling and Daemon Mode

For production deployments requiring continuous synchronization, `IncrementalUpdateManager` supports background execution via `IncrementalUpdateScheduler`. The constructor initializes the scheduler at line 421:

```python
self.scheduler = IncrementalUpdateScheduler(self.config)

```

Individual pipeline components register for periodic execution:

```python
self.scheduler.schedule_component("graph_consistency", self.verify_graph_consistency)

```

When the CLI entry point receives the `--daemon` flag (lines 560-571), the scheduler starts a background thread that executes the incremental update loop at configurable intervals rather than running a single pass.

## Implementation Examples

### One-Off Incremental Update

Execute a single synchronization cycle from a Python script:

```python
from graphrag_agent.integrations.build.incremental_update import IncrementalUpdateManager

# Initialise the manager (uses default FILES_DIR from settings)

updater = IncrementalUpdateManager()

# Detect changes and run the update pipeline

if updater.detect_file_changes():
    summary = updater.updater.process_incremental_update()
    print("Update summary:", summary)

```

*Key lines:* Construction at [`incremental_update.py`](https://github.com/1517005260/graph-rag-agent/blob/main/incremental_update.py) lines 32-40; pipeline execution at [`incremental_graph_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/incremental_graph_builder.py) line 886.

### Continuous Daemon Execution

Run the updater as a persistent background service:

```python
import argparse
from graphrag_agent.integrations.build.incremental_update import IncrementalUpdateManager

parser = argparse.ArgumentParser()
parser.add_argument("--daemon", action="store_true", help="Run continuously")
args = parser.parse_args()

manager = IncrementalUpdateManager()
if args.daemon:
    manager.scheduler.start()          # starts the periodic scheduler

else:
    manager.detect_file_changes()
    manager.updater.process_incremental_update()

```

*Key lines:* CLI handling and daemon start at [`incremental_update.py`](https://github.com/1517005260/graph-rag-agent/blob/main/incremental_update.py) lines 560-571.

### Manual Consistency Repair

Force a validation and repair operation on demand:

```python
from graphrag_agent.integrations.build.incremental_update import IncrementalUpdateManager

mgr = IncrementalUpdateManager()

# Force a consistency check (repairs if `repair=True`)

report = mgr.verify_graph_consistency(repair=True)
print(report)

```

*Key lines:* `verify_graph_consistency` method definition at [`incremental_update.py`](https://github.com/1517005260/graph-rag-agent/blob/main/incremental_update.py) lines 173-186.

## Summary

- The **incremental update mechanism** processes only delta changes—added, modified, or deleted files—rather than rebuilding the entire Neo4j knowledge graph.
- **IncrementalUpdateManager** serves as the orchestration layer while **IncrementalGraphUpdater** executes the eight-step mutation pipeline defined in [`incremental_graph_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/incremental_graph_builder.py).
- **FileChangeManager** detects filesystem diffs that trigger targeted updates to entities, relationships, and vector embeddings.
- Automatic **graph consistency validation** runs after deletions to remove orphaned relationships using `GraphConsistencyValidator`.
- The architecture supports both ad-hoc execution and **daemon mode** via `IncrementalUpdateScheduler` for real-time synchronization.

## Frequently Asked Questions

### How does the system handle deleted source files?

When files are removed from the monitored directory, the `process_deleted_files` method (line 572 in [`incremental_graph_builder.py`](https://github.com/1517005260/graph-rag-agent/blob/main/incremental_graph_builder.py)) identifies all associated entities and chunks and removes them from Neo4j. The `IncrementalUpdateManager` then automatically invokes `verify_graph_consistency()` to eliminate any dangling relationships that reference the deleted nodes, ensuring referential integrity.

### What is the difference between IncrementalUpdateManager and IncrementalGraphUpdater?

`IncrementalUpdateManager` acts as the high-level orchestrator that handles configuration, scheduling, and exposes the public API methods such as `detect_file_changes()` and `verify_graph_consistency()`. `IncrementalGraphUpdater` is the lower-level execution engine that implements the actual graph mutations, embedding computations, and Cypher query generation against the Neo4j database.

### Can the incremental update run automatically on a schedule?

Yes. The system includes `IncrementalUpdateScheduler`, which can trigger the update pipeline at configurable intervals. When instantiated with the `--daemon` flag, `IncrementalUpdateManager` starts the scheduler in a background thread, continuously monitoring the source directory and applying changes without manual intervention or cron jobs.

### How does the system prevent duplicate entities during updates?

During the `integrate_new_entities` step (line 315), the system deduplicates entities via unique properties before batch insertion into Neo4j. This ensures that modified documents containing existing entities update the current nodes rather than creating duplicates, maintaining a canonical representation of each unique concept in the knowledge graph.