How the Incremental Update Mechanism for Knowledge Graphs Works in graph-rag-agent

The incremental update mechanism in graph-rag-agent uses a file-watching pipeline that detects additions, modifications, and deletions in source documents, then translates those changes into targeted Neo4j graph mutations without rebuilding the entire index.

The 1517005260/graph-rag-agent repository implements a production-ready incremental update mechanism for knowledge graphs that synchronizes source document deltas directly into Neo4j. This architecture eliminates the computational cost of full re-indexing by processing only changed files through a modular pipeline of specialized coordinator classes.

Architecture of the Incremental Update System

The system is organized around three primary classes that separate concerns between orchestration, execution, and scheduling:

Detecting File System Changes

Change detection begins with the IncrementalGraphUpdater delegating to a FileChangeManager instance. The detect_changes method returns a categorized dictionary of file operations:

def detect_changes(self) -> Dict[str, List[str]]:
    """Return a dict with keys 'added', 'modified', 'deleted'."""
    return self.file_manager.detect_changes()

Source: graphrag_agent/integrations/build/incremental_graph_builder.py, line 91

The IncrementalUpdateManager.detect_file_changes() method (lines 69-80 in incremental_update.py) forwards this call and logs the detected delta for observability. This separation allows the core updater to remain agnostic of the specific file-system implementation while the manager handles operational concerns.

The Incremental Processing Pipeline

All graph mutations flow through IncrementalGraphUpdater.process_incremental_update() (line 886), which orchestrates eight distinct sub-operations. Each step has a dedicated helper method to maintain modularity:

1. Ingesting New Documents

The process_new_files method (line 100) extracts entities and text chunks from newly added source files, computes vector embeddings, and creates the initial node and edge structures in memory before committing to Neo4j.

2. Entity Deduplication and Integration

integrate_new_entities (line 315) performs batch creation of entity nodes while deduplicating against existing graph data via unique properties. This prevents duplicate nodes when modified files contain entities already present in the knowledge graph.

3. Relationship Construction

integrate_new_relationships (line 365) builds edges between newly created entities based on semantic relations extracted during document processing, linking the fresh sub-graph into the existing topology.

4. Graph Structure Merging

The merge_graph_structures method (line 455) resolves conflicts between the newly generated sub-graph and the persistent Neo4j store, handling property updates and node reconciliation through atomic Cypher merge operations.

5. Embedding Synchronization for Modifications

When documents are modified, update_changed_file_embeddings (line 526) identifies affected chunks, recomputes their vector representations, and replaces the obsolete embeddings in the vector store to maintain search accuracy.

6. Handling Deletions

process_deleted_files (line 572) identifies all entities, chunks, and relationships associated with removed source files and executes deletions to prevent stale data from persisting in the graph.

7. Backup and Export Operations

export_graph_structure and import_graph_structure (lines 677-733) provide serialization capabilities for creating point-in-time backups or migrating the updated graph to downstream systems.

8. Statistics and Reporting

Finally, display_graph_statistics (line 859) computes aggregate counts of nodes and edges, generating a concise summary dictionary that reports the scope of the incremental changes applied.

Graph Consistency Validation

After processing deletions, the system validates structural integrity to prevent orphaned relationships. The IncrementalUpdateManager instantiates a GraphConsistencyValidator from graphrag_agent/graph/graph_consistency_validator.py and exposes it through verify_graph_consistency() (lines 173-186).

When the deletion count exceeds zero, the manager automatically triggers validation:

if deleted_count > 0:
    self.verify_graph_consistency()

Source: graphrag_agent/integrations/build/incremental_update.py, lines 92-99

Scheduling and Daemon Mode

For production deployments requiring continuous synchronization, IncrementalUpdateManager supports background execution via IncrementalUpdateScheduler. The constructor initializes the scheduler at line 421:

self.scheduler = IncrementalUpdateScheduler(self.config)

Individual pipeline components register for periodic execution:

self.scheduler.schedule_component("graph_consistency", self.verify_graph_consistency)

When the CLI entry point receives the --daemon flag (lines 560-571), the scheduler starts a background thread that executes the incremental update loop at configurable intervals rather than running a single pass.

Implementation Examples

One-Off Incremental Update

Execute a single synchronization cycle from a Python script:

from graphrag_agent.integrations.build.incremental_update import IncrementalUpdateManager

# Initialise the manager (uses default FILES_DIR from settings)

updater = IncrementalUpdateManager()

# Detect changes and run the update pipeline

if updater.detect_file_changes():
    summary = updater.updater.process_incremental_update()
    print("Update summary:", summary)

Key lines: Construction at incremental_update.py lines 32-40; pipeline execution at incremental_graph_builder.py line 886.

Continuous Daemon Execution

Run the updater as a persistent background service:

import argparse
from graphrag_agent.integrations.build.incremental_update import IncrementalUpdateManager

parser = argparse.ArgumentParser()
parser.add_argument("--daemon", action="store_true", help="Run continuously")
args = parser.parse_args()

manager = IncrementalUpdateManager()
if args.daemon:
    manager.scheduler.start()          # starts the periodic scheduler

else:
    manager.detect_file_changes()
    manager.updater.process_incremental_update()

Key lines: CLI handling and daemon start at incremental_update.py lines 560-571.

Manual Consistency Repair

Force a validation and repair operation on demand:

from graphrag_agent.integrations.build.incremental_update import IncrementalUpdateManager

mgr = IncrementalUpdateManager()

# Force a consistency check (repairs if `repair=True`)

report = mgr.verify_graph_consistency(repair=True)
print(report)

Key lines: verify_graph_consistency method definition at incremental_update.py lines 173-186.

Summary

  • The incremental update mechanism processes only delta changes—added, modified, or deleted files—rather than rebuilding the entire Neo4j knowledge graph.
  • IncrementalUpdateManager serves as the orchestration layer while IncrementalGraphUpdater executes the eight-step mutation pipeline defined in incremental_graph_builder.py.
  • FileChangeManager detects filesystem diffs that trigger targeted updates to entities, relationships, and vector embeddings.
  • Automatic graph consistency validation runs after deletions to remove orphaned relationships using GraphConsistencyValidator.
  • The architecture supports both ad-hoc execution and daemon mode via IncrementalUpdateScheduler for real-time synchronization.

Frequently Asked Questions

How does the system handle deleted source files?

When files are removed from the monitored directory, the process_deleted_files method (line 572 in incremental_graph_builder.py) identifies all associated entities and chunks and removes them from Neo4j. The IncrementalUpdateManager then automatically invokes verify_graph_consistency() to eliminate any dangling relationships that reference the deleted nodes, ensuring referential integrity.

What is the difference between IncrementalUpdateManager and IncrementalGraphUpdater?

IncrementalUpdateManager acts as the high-level orchestrator that handles configuration, scheduling, and exposes the public API methods such as detect_file_changes() and verify_graph_consistency(). IncrementalGraphUpdater is the lower-level execution engine that implements the actual graph mutations, embedding computations, and Cypher query generation against the Neo4j database.

Can the incremental update run automatically on a schedule?

Yes. The system includes IncrementalUpdateScheduler, which can trigger the update pipeline at configurable intervals. When instantiated with the --daemon flag, IncrementalUpdateManager starts the scheduler in a background thread, continuously monitoring the source directory and applying changes without manual intervention or cron jobs.

How does the system prevent duplicate entities during updates?

During the integrate_new_entities step (line 315), the system deduplicates entities via unique properties before batch insertion into Neo4j. This ensures that modified documents containing existing entities update the current nodes rather than creating duplicates, maintaining a canonical representation of each unique concept in the knowledge graph.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →