Graph Machine Learning Algorithms in HugeGraph-ML: Complete Model Catalog
HugeGraph-ML ships 21 production-ready graph neural networks including GCN, GraphSAGE, DiffPool, and self-supervised methods like BGRL and DGI for node classification, link prediction, and graph-level prediction tasks.
The hugegraph-ml module in the Apache HugeGraph AI incubator provides a comprehensive toolkit for graph machine learning on large-scale property graphs. Built on top of the Deep Graph Library (DGL), the library implements state-of-the-art GNN architectures that integrate directly with HugeGraph server instances, supporting tasks ranging from fraud detection to hierarchical graph classification.
Node-Level Prediction Algorithms
Convolutional and Attention-Based Models
The library includes several variants of Graph Convolutional Networks (GCNs) and attention mechanisms for node classification:
- GCN (
Graph Convolutional Network): Implements the semi-supervised learning approach for node classification and link prediction. Found inhugegraph-ml/src/hugegraph_ml/models/seal.pyas classGCN(line 51). - AGNN (
Attention-based Graph Neural Network): Uses attention mechanisms to weigh neighbor features adaptively. Located inagnn.pyas classAGNN(line 33). - APPNP (
Approximate Personalized Propagation of Neural Predictions): Combines neural networks with personalized PageRank for semi-supervised node classification. Implemented inappnp.pyas classAPNP(line 32). - ARMA (
Auto-Regressive Moving Average GNN): Provides a generic graph convolutional layer through ARMA filters for both node and graph classification tasks. Found inarma.pyas classARMAConv(line 49). - DAGNN (
Directed Acyclic Graph Neural Network): Specifically designed for node classification on directed acyclic graphs. Implemented indagnn.pyas classDAGNN(line 100). - DeeperGCN: Enables training of very deep GCNs with residual connections. Located in
deepergcn.pyas classDeeperGCN(line 38). - GRAND (
Graph Random Neural Network): Uses random propagation for semi-supervised node classification. Found ingrand.pyas classGRAND(line 35).
Advanced Architectures and Baselines
- JKNet (
Jumping Knowledge Network): Aggregates representations from different layers to adapt neighborhood ranges. Supports both node and graph classification tasks. Implemented injknet.pyas classJKNet(line 33). - MLPClassifier: A simple multi-layer perceptron baseline for node classification without graph structure. Found in
mlp.pyas classMLPClassifier(line 22). - Correct & Smooth: A post-processing technique that improves node classification accuracy through error correction and label smoothing. Implemented in
correct_and_smooth.pyas classCorrectAndSmooth(line 160).
Edge-Level and Link Prediction Algorithms
For link prediction and edge-level tasks, HugeGraph-ML provides specialized architectures:
- PGNN (
Propagated GNN): Designed specifically for link prediction with distance-aware propagation. Located inpgnn.pyas classPGNN(line 79). - BGNN (
Bipartite Graph Neural Network): Handles link prediction on bipartite graph structures. Implemented inbgnn.pyas classGNNModelDGL(line 528). - GCN (SEAL variant): The SEAL implementation of GCN supports link prediction through subgraph extraction and learning.
Graph-Level Classification Algorithms
For whole-graph prediction tasks, the library offers hierarchical and pooling-based methods:
- DiffPool (
Differentiable Pooling): Hierarchically coarsens graphs through learnable pooling for graph classification. Found indiffpool.pyas classDiffPool(line 36). - DGCNN (
Diffusion Graph CNN): Captures structural information via diffusion processes for graph classification. Implemented inseal.pyas classDGCNN(line 175). - GIN (
Graph Isomorphism Network): A powerful architecture for distinguishing graph structures, supporting both node and graph classification. Located ingin_global_pool.pyas classGIN(line 26). - ARMA and JKNet: As noted above, these also support graph-level tasks when combined with global pooling layers.
Self-Supervised and Unsupervised Learning
For representation learning without labeled data:
- DGI (
Deep Graph Infomax): Learns node representations by maximizing mutual information between local patches and global summaries. Implemented indgi.pyas classDGI(line 35). - GRACE (
Graph Contrastive Learning): Uses contrastive learning to generate node embeddings without supervision. Found ingrace.pyas classGRACE(line 36). - BGRL (
Bootstrap Graph Representation Learning): A self-supervised approach using online and target networks for node embedding. Located inbgrl.pyas classBGRL(line 93).
Specialized and Heterogeneous Graph Algorithms
- GraphSAGE (
SAGE): Supports inductive learning on unseen nodes. Implemented inhugegraph-ml/src/hugegraph_ml/models/cluster_gcn.pyas classSAGE(line 34). - GATNE (
Graph Attention Network for Heterogeneous Graphs): Handles heterogeneous node embedding via attention mechanisms. Found ingatne.pyas classDGLGATNE(line 66). - CARE-GNN (
Context-aware Attention RE-graph Neural Network): Specialized for fraud detection on heterogeneous graphs. Implemented incare_gnn.pyas classCAREGNN(line 127).
Practical Usage Examples
Node Classification with SEAL-GCN
The following example demonstrates link prediction using the GCN implementation from the SEAL (Subgraph Extraction and Learning) framework:
from hugegraph_ml.models.seal import GCN
import torch
# Hyper-parameters
num_layers = 2
hidden_units = 64
num_nodes = 1000
# Dummy inputs
g = ... # a DGLGraph instance
z = torch.randint(0, 100, (num_nodes, 1)) # node labels
node_id = torch.arange(num_nodes).unsqueeze(1) # node IDs
# Model instantiation
gcn = GCN(num_layers, hidden_units, gcn_type="gcn",
pooling_type="sum", num_nodes=num_nodes, dropout=0.5)
# Forward pass produces link existence logits
logits = gcn(g, z, node_id=node_id)
Source: hugegraph-ml/src/hugegraph_ml/models/seal.py (line 51).
Hierarchical Graph Classification with DiffPool
For graph-level prediction tasks, DiffPool learns a differentiable soft assignment of nodes to clusters:
from hugegraph_ml.models.diffpool import DiffPool
import dgl
# Assume batched DGLGraph `bg` with node features `h`
model = DiffPool(
input_dim=128,
hidden_dim=64,
assign_dim=30,
num_classes=3,
num_layers=3,
dropout=0.5,
)
# Output logits for graph classification
logits = model(bg, h) # shape: (batch_size, num_classes)
Source: hugegraph-ml/src/hugegraph_ml/models/diffpool.py (line 36).
Self-Supervised Node Embedding with BGRL
BGRL enables learning node representations without labels through bootstrap latent representations:
from hugegraph_ml.models.bgrl import BGRL, GCN as EncoderGCN
import dgl
# Configure encoder and BGRL framework
encoder = EncoderGCN(num_layers=2, hidden_units=256, gcn_type="sage")
bgrl = BGRL(encoder, predictor_hidden_dim=256, pred_hidden_dim=256)
# Generate two corrupted views of the same graph
g1, g2 = dgl.heterograph(...), dgl.heterograph(...)
z1 = bgrl(g1)
z2 = bgrl(g2)
# Compute contrastive loss
loss = bgrl.loss(z1, z2)
Source: hugegraph-ml/src/hugegraph_ml/models/bgrl.py (line 93).
Summary
- HugeGraph-ML provides 21 graph machine learning algorithms spanning node classification, link prediction, and graph classification tasks.
- Core implementations reside in
hugegraph-ml/src/hugegraph_ml/models/with specific files likeseal.py,diffpool.py, andbgrl.pycontaining the model classes. - Architectural diversity includes convolutional networks (GCN, AGNN), attention mechanisms (GATNE), hierarchical methods (DiffPool), and self-supervised approaches (DGI, BGRL).
- Specialized support exists for heterogeneous graphs (GATNE, CARE-GNN), directed acyclic graphs (DAGNN), and inductive learning (GraphSAGE).
- Integration with DGL allows direct usage of these models with HugeGraph server data through the provided task wrappers in
hugegraph_ml/tasks/.
Frequently Asked Questions
Which algorithms in hugegraph-ml support heterogeneous graphs?
GATNE (class DGLGATNE in gatne.py) and CARE-GNN (class CAREGNN in care_gnn.py) are specifically designed for heterogeneous graphs. GATNE handles general heterogeneous node embedding via attention mechanisms, while CARE-GNN targets fraud detection scenarios with context-aware neighbor sampling.
What is the difference between GCN and GraphSAGE implementations?
According to the source code, GCN (in seal.py) operates transductively and is typically used for semi-supervised node classification or link prediction within fixed graphs. GraphSAGE (implemented as class SAGE in cluster_gcn.py) supports inductive learning, allowing inference on unseen nodes by sampling and aggregating features from local neighborhoods.
How do I choose between supervised and self-supervised algorithms?
Use supervised algorithms like GCN, APPNP, or DAGNN when labeled training data is available for node or graph classification tasks. Choose self-supervised methods like DGI, GRACE, or BGRL when you need to learn node embeddings from unlabeled data—these generate representations through contrastive or mutual information maximization objectives that can later be fine-tuned on downstream tasks.
Which models support graph-level classification versus node-level tasks?
For graph-level classification, use DiffPool (hierarchical pooling), DGCNN (diffusion CNN), GIN (Graph Isomorphism Network), or JKNet with global pooling. For node-level prediction, use GCN, AGNN, APPNP, GRAND, or DAGNN. Some models like ARMA and JKNet support both tasks depending on the output head configuration.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →