HugeGraph-LLM Module Architecture: Directory Structure and Key Components Explained

The hugegraph-llm module organizes its source code into ten purpose-driven directories—utils/, state/, operators/, nodes/, indices/, flows/, config/, api/, document/, enums/, and tests/—forming a layered pipeline that transforms natural language into graph queries and retrieval-augmented generation (RAG) outputs.

The hugegraph-llm module serves as the core intelligence layer within the Apache HugeGraph-AI project, residing under hugegraph-llm/src/hugegraph_llm/. Its architecture follows a clean separation of concerns, enabling developers to extend LLM capabilities, swap vector backends, or modify graph schemas with minimal friction. Understanding this hugegraph-llm module architecture is essential for customizing pipelines or debugging the text-to-Gremlin and RAG workflows.

The Core Directory Layout

The module's root package contains distinct functional groups that handle everything from low-level logging to high-level REST API exposure.

Utility and Configuration Layers

The foundational layers provide cross-cutting services and runtime settings.

State and Index Management

These directories manage runtime context and similarity search capabilities.

The Operator Layer

The operators/ directory contains the core LLM-driven logic, subdivided by function:

Node Abstractions

The nodes/ directory wraps operators into graph-compatible execution units used by the scheduler. Each node exposes a uniform run() interface and manages input/output conversion. Examples include hugegraph-llm/src/hugegraph_llm/nodes/llm_node/text2gremlin.py for Gremlin generation workflows and hugegraph-llm/src/hugegraph_llm/nodes/index_node/vector_query_node.py for vector similarity queries.

Pipeline Orchestration

The flows/ directory defines high-level pipeline compositions that stitch nodes into end-to-end services. These scheduler-driven flows include:

Interface and Validation

Architectural Layers Explained

The hugegraph-llm module architecture follows a six-layer stack:

  1. Foundationutils/, enums/, and config/ provide shared services and type safety.
  2. Contextstate/ and indices/ manage execution state and vector/graph search capabilities.
  3. Logicoperators/ implements the actual LLM processing steps.
  4. Adaptationnodes/ adapts operators to the scheduler's graph execution model.
  5. Orchestrationflows/ composes nodes into complete pipelines (RAG, text2gremlin).
  6. Surfaceapi/ and tests/ expose and verify the service interface.

Practical Code Examples

Executing a Graph-Only RAG Flow

The scheduler singleton orchestrates flows defined in the flows/ directory:

from hugegraph_llm.flows.scheduler import SchedulerSingleton

scheduler = SchedulerSingleton.get_instance()

result = scheduler.schedule_flow(
    "rag_graph_only",
    query="Tell me about the movie Inception.",
    graph_only_answer=True,
    vector_only_answer=False,
)

print("Graph-only answer:", result.get("graph_only_answer"))

This utilizes flows/rag_flow_graph_only.py, nodes/, operators/, and indices/ to execute the query.

Building a Semantic Vector Index

Index construction flows leverage operators and index implementations:

from hugegraph_llm.flows.scheduler import SchedulerSingleton

documents = [
    {"id": "doc1", "content": "Apache HugeGraph is a graph database."},
    {"id": "doc2", "content": "Large Language Models can reason over graphs."},
]

scheduler = SchedulerSingleton.get_instance()
index_res = scheduler.schedule_flow("build_semantic_index", documents)

print("Index built:", index_res)

This calls operators/index_op/build_semantic_index.py and persists vectors via indices/vector_index/faiss_vector_store.py.

Converting Natural Language to Gremlin

The text2gremlin flow demonstrates how the architecture handles complex multi-step LLM operations:

from hugegraph_llm.flows.scheduler import SchedulerSingleton

scheduler = SchedulerSingleton.get_instance()
gremlin_res = scheduler.schedule_flow(
    "text2gremlin",
    "find all people who studied at MIT",
    2,                     # number of examples

    "hugegraph",           # schema name

    None,                  # custom prompt

    ["template_gremlin", "raw_gremlin"],
)

print("Gremlin template:", gremlin_res.get("template_gremlin"))

Underlying this are nodes/llm_node/text2gremlin.py and operators/llm_op/gremlin_generate.py.

Key Source Files by Function

Function File Path Purpose
Logging hugegraph-llm/src/hugegraph_llm/utils/log.py Centralized logging utilities
Embeddings hugegraph-llm/src/hugegraph_llm/utils/embedding_utils.py Vector calculation helpers
Runtime State hugegraph-llm/src/hugegraph_llm/state/ai_state.py Execution context management
Keyword Extraction hugegraph-llm/src/hugegraph_llm/operators/llm_op/keyword_extract.py NLP keyword extraction
Gremlin Generation hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py LLM-based query generation
Vector Storage hugegraph-llm/src/hugegraph_llm/indices/vector_index/faiss_vector_store.py FAISS backend implementation
Graph Index hugegraph-llm/src/hugegraph_llm/indices/graph_index.py Gremlin example storage
RAG API hugegraph-llm/src/hugegraph_llm/api/rag_api.py FastAPI endpoint definitions
Configuration hugegraph-llm/src/hugegraph_llm/config/llm_config.py Provider and model settings
Document Chunking hugegraph-llm/src/hugegraph_llm/document/chunk_split.py Text segmentation utilities

Summary

  • The hugegraph-llm module architecture separates concerns across ten directories, from low-level utilities to high-level API endpoints.
  • operators/ contains the core LLM logic, subdivided into llm_op/, index_op/, document_op/, and common_op/.
  • nodes/ wraps operators for the scheduler, while flows/ orchestrates them into complete pipelines.
  • indices/ abstracts vector (FAISS, Milvus, Qdrant) and graph indexes, enabling pluggable similarity search.
  • All configurations are Pydantic-based in config/, and the api/ layer exposes FastAPI endpoints for external integration.

Frequently Asked Questions

What is the role of the operators directory in hugegraph-llm?

The operators/ directory implements the actual LLM-driven processing steps, including keyword extraction, schema building, property-graph extraction, and Gremlin query generation. It is subdivided into llm_op/ for LLM-centric tasks, index_op/ for index construction, document_op/ for text processing, and common_op/ for shared utilities. Each operator is a discrete unit of work that can be chained together via the scheduler.

How does the nodes directory differ from the operators directory?

While operators/ contains the raw business logic for LLM interactions, nodes/ wraps these operators into graph-compatible execution units that conform to the scheduler's interface. Nodes manage input/output conversion and expose a uniform run() method, allowing the scheduler in flows/ to treat diverse operations as interchangeable vertices in an execution graph.

Which directory contains the REST API endpoints?

The api/ directory houses FastAPI-style REST endpoints that expose RAG and text2gremlin services to external callers. The primary entry point is hugegraph-llm/src/hugegraph_llm/api/rag_api.py, which forwards HTTP requests to the appropriate scheduler flows and returns JSON responses suitable for frontend consumption (e.g., Gradio UIs).

Where are vector indexes implemented in the hugegraph-llm module?

Vector indexes reside in indices/vector_index/, with concrete implementations for FAISS (faiss_vector_store.py), Milvus, and Qdrant. These classes provide the storage and retrieval mechanisms used by operators/index_op/build_semantic_index.py and the vector query nodes, abstracting the underlying vector database specifics from the rest of the pipeline.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →