Graph Machine Learning Algorithms in HugeGraph-ML: Complete Model Catalog

HugeGraph-ML ships 21 production-ready graph neural networks including GCN, GraphSAGE, DiffPool, and self-supervised methods like BGRL and DGI for node classification, link prediction, and graph-level prediction tasks.

The hugegraph-ml module in the Apache HugeGraph AI incubator provides a comprehensive toolkit for graph machine learning on large-scale property graphs. Built on top of the Deep Graph Library (DGL), the library implements state-of-the-art GNN architectures that integrate directly with HugeGraph server instances, supporting tasks ranging from fraud detection to hierarchical graph classification.

Node-Level Prediction Algorithms

Convolutional and Attention-Based Models

The library includes several variants of Graph Convolutional Networks (GCNs) and attention mechanisms for node classification:

  • GCN (Graph Convolutional Network): Implements the semi-supervised learning approach for node classification and link prediction. Found in hugegraph-ml/src/hugegraph_ml/models/seal.py as class GCN (line 51).
  • AGNN (Attention-based Graph Neural Network): Uses attention mechanisms to weigh neighbor features adaptively. Located in agnn.py as class AGNN (line 33).
  • APPNP (Approximate Personalized Propagation of Neural Predictions): Combines neural networks with personalized PageRank for semi-supervised node classification. Implemented in appnp.py as class APNP (line 32).
  • ARMA (Auto-Regressive Moving Average GNN): Provides a generic graph convolutional layer through ARMA filters for both node and graph classification tasks. Found in arma.py as class ARMAConv (line 49).
  • DAGNN (Directed Acyclic Graph Neural Network): Specifically designed for node classification on directed acyclic graphs. Implemented in dagnn.py as class DAGNN (line 100).
  • DeeperGCN: Enables training of very deep GCNs with residual connections. Located in deepergcn.py as class DeeperGCN (line 38).
  • GRAND (Graph Random Neural Network): Uses random propagation for semi-supervised node classification. Found in grand.py as class GRAND (line 35).

Advanced Architectures and Baselines

  • JKNet (Jumping Knowledge Network): Aggregates representations from different layers to adapt neighborhood ranges. Supports both node and graph classification tasks. Implemented in jknet.py as class JKNet (line 33).
  • MLPClassifier: A simple multi-layer perceptron baseline for node classification without graph structure. Found in mlp.py as class MLPClassifier (line 22).
  • Correct & Smooth: A post-processing technique that improves node classification accuracy through error correction and label smoothing. Implemented in correct_and_smooth.py as class CorrectAndSmooth (line 160).

For link prediction and edge-level tasks, HugeGraph-ML provides specialized architectures:

  • PGNN (Propagated GNN): Designed specifically for link prediction with distance-aware propagation. Located in pgnn.py as class PGNN (line 79).
  • BGNN (Bipartite Graph Neural Network): Handles link prediction on bipartite graph structures. Implemented in bgnn.py as class GNNModelDGL (line 528).
  • GCN (SEAL variant): The SEAL implementation of GCN supports link prediction through subgraph extraction and learning.

Graph-Level Classification Algorithms

For whole-graph prediction tasks, the library offers hierarchical and pooling-based methods:

  • DiffPool (Differentiable Pooling): Hierarchically coarsens graphs through learnable pooling for graph classification. Found in diffpool.py as class DiffPool (line 36).
  • DGCNN (Diffusion Graph CNN): Captures structural information via diffusion processes for graph classification. Implemented in seal.py as class DGCNN (line 175).
  • GIN (Graph Isomorphism Network): A powerful architecture for distinguishing graph structures, supporting both node and graph classification. Located in gin_global_pool.py as class GIN (line 26).
  • ARMA and JKNet: As noted above, these also support graph-level tasks when combined with global pooling layers.

Self-Supervised and Unsupervised Learning

For representation learning without labeled data:

  • DGI (Deep Graph Infomax): Learns node representations by maximizing mutual information between local patches and global summaries. Implemented in dgi.py as class DGI (line 35).
  • GRACE (Graph Contrastive Learning): Uses contrastive learning to generate node embeddings without supervision. Found in grace.py as class GRACE (line 36).
  • BGRL (Bootstrap Graph Representation Learning): A self-supervised approach using online and target networks for node embedding. Located in bgrl.py as class BGRL (line 93).

Specialized and Heterogeneous Graph Algorithms

  • GraphSAGE (SAGE): Supports inductive learning on unseen nodes. Implemented in hugegraph-ml/src/hugegraph_ml/models/cluster_gcn.py as class SAGE (line 34).
  • GATNE (Graph Attention Network for Heterogeneous Graphs): Handles heterogeneous node embedding via attention mechanisms. Found in gatne.py as class DGLGATNE (line 66).
  • CARE-GNN (Context-aware Attention RE-graph Neural Network): Specialized for fraud detection on heterogeneous graphs. Implemented in care_gnn.py as class CAREGNN (line 127).

Practical Usage Examples

Node Classification with SEAL-GCN

The following example demonstrates link prediction using the GCN implementation from the SEAL (Subgraph Extraction and Learning) framework:

from hugegraph_ml.models.seal import GCN
import torch

# Hyper-parameters

num_layers = 2
hidden_units = 64
num_nodes = 1000

# Dummy inputs

g = ...                    # a DGLGraph instance

z = torch.randint(0, 100, (num_nodes, 1))          # node labels

node_id = torch.arange(num_nodes).unsqueeze(1)     # node IDs

# Model instantiation

gcn = GCN(num_layers, hidden_units, gcn_type="gcn", 
          pooling_type="sum", num_nodes=num_nodes, dropout=0.5)

# Forward pass produces link existence logits

logits = gcn(g, z, node_id=node_id)

Source: hugegraph-ml/src/hugegraph_ml/models/seal.py (line 51).

Hierarchical Graph Classification with DiffPool

For graph-level prediction tasks, DiffPool learns a differentiable soft assignment of nodes to clusters:

from hugegraph_ml.models.diffpool import DiffPool
import dgl

# Assume batched DGLGraph `bg` with node features `h`

model = DiffPool(
    input_dim=128,
    hidden_dim=64,
    assign_dim=30,
    num_classes=3,
    num_layers=3,
    dropout=0.5,
)

# Output logits for graph classification

logits = model(bg, h)   # shape: (batch_size, num_classes)

Source: hugegraph-ml/src/hugegraph_ml/models/diffpool.py (line 36).

Self-Supervised Node Embedding with BGRL

BGRL enables learning node representations without labels through bootstrap latent representations:

from hugegraph_ml.models.bgrl import BGRL, GCN as EncoderGCN
import dgl

# Configure encoder and BGRL framework

encoder = EncoderGCN(num_layers=2, hidden_units=256, gcn_type="sage")
bgrl = BGRL(encoder, predictor_hidden_dim=256, pred_hidden_dim=256)

# Generate two corrupted views of the same graph

g1, g2 = dgl.heterograph(...), dgl.heterograph(...)
z1 = bgrl(g1)
z2 = bgrl(g2)

# Compute contrastive loss

loss = bgrl.loss(z1, z2)

Source: hugegraph-ml/src/hugegraph_ml/models/bgrl.py (line 93).

Summary

  • HugeGraph-ML provides 21 graph machine learning algorithms spanning node classification, link prediction, and graph classification tasks.
  • Core implementations reside in hugegraph-ml/src/hugegraph_ml/models/ with specific files like seal.py, diffpool.py, and bgrl.py containing the model classes.
  • Architectural diversity includes convolutional networks (GCN, AGNN), attention mechanisms (GATNE), hierarchical methods (DiffPool), and self-supervised approaches (DGI, BGRL).
  • Specialized support exists for heterogeneous graphs (GATNE, CARE-GNN), directed acyclic graphs (DAGNN), and inductive learning (GraphSAGE).
  • Integration with DGL allows direct usage of these models with HugeGraph server data through the provided task wrappers in hugegraph_ml/tasks/.

Frequently Asked Questions

Which algorithms in hugegraph-ml support heterogeneous graphs?

GATNE (class DGLGATNE in gatne.py) and CARE-GNN (class CAREGNN in care_gnn.py) are specifically designed for heterogeneous graphs. GATNE handles general heterogeneous node embedding via attention mechanisms, while CARE-GNN targets fraud detection scenarios with context-aware neighbor sampling.

What is the difference between GCN and GraphSAGE implementations?

According to the source code, GCN (in seal.py) operates transductively and is typically used for semi-supervised node classification or link prediction within fixed graphs. GraphSAGE (implemented as class SAGE in cluster_gcn.py) supports inductive learning, allowing inference on unseen nodes by sampling and aggregating features from local neighborhoods.

How do I choose between supervised and self-supervised algorithms?

Use supervised algorithms like GCN, APPNP, or DAGNN when labeled training data is available for node or graph classification tasks. Choose self-supervised methods like DGI, GRACE, or BGRL when you need to learn node embeddings from unlabeled data—these generate representations through contrastive or mutual information maximization objectives that can later be fine-tuned on downstream tasks.

Which models support graph-level classification versus node-level tasks?

For graph-level classification, use DiffPool (hierarchical pooling), DGCNN (diffusion CNN), GIN (Graph Isomorphism Network), or JKNet with global pooling. For node-level prediction, use GCN, AGNN, APPNP, GRAND, or DAGNN. Some models like ARMA and JKNet support both tasks depending on the output head configuration.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →