# Graph Machine Learning Tasks in HugeGraph-ML: A Complete Guide to Node, Link, and Graph-Level Learning

> Explore nine production-ready graph machine learning tasks in HugeGraph-ML including node classification link prediction and graph-level learning. Get started today.

- Repository: [The Apache Software Foundation/incubator-hugegraph-ai](https://github.com/apache/incubator-hugegraph-ai)
- Tags: tutorial
- Published: 2026-02-24

---

**HugeGraph-ML supports nine production-ready graph machine learning tasks ranging from node classification and link prediction to graph-level classification and fraud detection, all built on Deep Graph Library (DGL) with consistent APIs for training, evaluation, and device management.**

Apache HugeGraph-ML is a modular Python library in the `apache/incubator-hugegraph-ai` repository that simplifies complex graph analytics by providing high-level task wrappers around DGL models. Whether you need to classify nodes in billion-edge networks or detect fraudulent transactions in heterogeneous graphs, the library abstracts away boilerplate code for device placement, early stopping, and metric logging. All graph machine learning tasks follow a unified three-stage pattern: data conversion via `hugegraph_ml.data`, model definition in `hugegraph_ml.models`, and task orchestration through specialized classes in `hugegraph_ml.tasks`.

## Node-Level Graph Machine Learning Tasks

Node-level tasks focus on learning representations or predicting labels for individual vertices. HugeGraph-ML provides four distinct approaches depending on your supervision requirements and graph scale.

### Unsupervised Node Embedding

The **Node Embedding** task learns low-dimensional vectors for every node without requiring labeled data. Implemented in [`hugegraph_ml/tasks/node_embed.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/node_embed.py), the `NodeEmbed` class trains models like GATNE (Graph Attention Network with Type Embedding) to produce dense node representations suitable for downstream clustering or visualization.

### Node Classification (Full-Graph)

For supervised learning on graphs that fit in memory, the `NodeClassify` task in [`hugegraph_ml/tasks/node_classify.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/node_classify.py) predicts categorical labels using node features and training masks. The class handles the full training loop, automatically moving data to GPU via `.to(self._device)` and monitoring validation metrics with the `EarlyStopping` utility from [`hugegraph_ml/utils/early_stopping.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/utils/early_stopping.py).

### Scalable Node Classification with Sampling

When working with massive graphs, `NodeClassifyWithSample` in [`hugegraph_ml/tasks/node_classify_with_sample.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/node_classify_with_sample.py) scales training using DGL's `ClusterGCNSampler`. This samples clusters of nodes rather than full-graph neighborhoods, enabling training on billion-edge datasets while maintaining model accuracy.

### Edge-Aware Node Classification

The `NodeClassifyWithEdge` task, found in [`hugegraph_ml/tasks/node_classify_with_edge.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/node_classify_with_edge.py), extends standard classification by incorporating **edge features** into the learning process. This is critical for relational domains where the connection type or strength between nodes provides predictive signal beyond node attributes.

## Graph-Level Prediction Tasks

### Graph Classification

For problems requiring whole-graph predictions (such as molecular property prediction), `GraphClassify` in [`hugegraph_ml/tasks/graph_classify.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/graph_classify.py) manages batched graph learning. The task uses DGL's `GraphDataLoader` to handle multiple graphs per batch, aggregating node representations into graph-level embeddings before feeding them to classification heads.

## Link Prediction and Edge Analysis

Link prediction tasks identify missing or future connections in the graph structure. HugeGraph-ML implements two distinct algorithmic approaches.

### SEAL-Based Link Prediction

The `LinkPredictionSeal` class in [`hugegraph_ml/tasks/link_prediction_seal.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/link_prediction_seal.py) implements the SEAL (Subgraphs, Embeddings, and Attributes for Link prediction) framework. It extracts **enclosing subgraphs** around target edges, encodes them using a DGCNN (Deep Graph Convolutional Neural Network) defined in [`hugegraph_ml/models/seal.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/models/seal.py), and trains with binary cross-entropy loss. The task automatically computes Hits@K metrics during evaluation.

### PGNN Link Prediction

`LinkPredictionPGNN` in [`hugegraph_ml/tasks/link_prediction_pgnn.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/link_prediction_pgnn.py) employs Probabilistic Graph Neural Networks, pre-selecting anchor nodes and leveraging shortest-path distances to model link probabilities. This approach excels in scenarios requiring distance-aware relational reasoning.

## Specialized Heterogeneous and Fraud Detection Tasks

### Heterogeneous Graph Embedding with GATNE

Multiplex heterogeneous networks require specialized handling. The `HeteroSampleEmbedGATNE` task in [`hugegraph_ml/tasks/hetero_sample_embed_gatne.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/hetero_sample_embed_gatne.py) implements the GATNE algorithm, learning type-specific embeddings for different edge types and aggregating them with attention mechanisms. This handles graphs with multiple relation types (e.g., social networks with "friend," "colleague," and "family" edges).

### Fraud Detection using CARE-GNN

For financial security applications, `DetectorCaregnn` in [`hugegraph_ml/tasks/fraud_detector_caregnn.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/fraud_detector_caregnn.py) provides binary classification optimized for transaction graphs using the CARE-GNN architecture. This specialized task includes sampling strategies designed to handle highly imbalanced fraud detection datasets.

## Common Architecture and Task Implementation Pattern

All graph machine learning tasks in HugeGraph-ML share consistent architectural components that accelerate development:

- **Unified Model Interface**: Every model exposes `forward`, `inference`, and `loss` methods, enabling plug-and-play substitution of architectures like `MLPClassifier`, `DGCNN`, or `DGLGATNE`.
- **Automatic Device Management**: Each task class initializes `self._device` based on a user-provided `gpu` index, automatically transferring both the DGL graph and model parameters to CPU or CUDA devices.
- **Early Stopping Integration**: The `EarlyStopping` utility monitors specified metrics (loss or accuracy) and restores the best model checkpoint automatically, preventing overfitting without manual intervention.
- **Data Loading Abstraction**: Tasks abstract DGL's `GraphDataLoader`, `ClusterGCNSampler`, and `NeighborSampler`, providing mini-batching for both homogeneous and heterogeneous graphs without requiring users to implement collate functions.

## Practical Code Examples

### Node Classification with MLP

```python
import torch
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.mlp import MLPClassifier
from hugegraph_ml.tasks.node_classify import NodeClassify

# Convert HugeGraph data to DGL format

hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(
    vertex_label="my_vertex",
    edge_label="my_edge",
    vertex_feature="feat",
)

# Initialize model with input/output dimensions matching your data

model = MLPClassifier(
    n_in_feat=graph.ndata["feat"].shape[1], 
    n_out_feat=5
)

# Execute training with automatic GPU placement and early stopping

task = NodeClassify(graph, model)
task.train(lr=1e-3, n_epochs=100, gpu=0)
metrics = task.evaluate()
print(metrics)  # {'accuracy': 0.82, 'loss': 0.34}

```

### Unsupervised Node Embedding with GATNE

```python
import torch
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.gatne import DGLGATNE
from hugegraph_ml.tasks.hetero_sample_embed_gatne import HeteroSampleEmbedGATNE

hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(vertex_label="person", edge_label="relationship")

# Configure GATNE with embedding dimensions for heterogeneous types

gatne = DGLGATNE(
    num_nodes=graph.num_nodes(),
    embedding_size=128,
    embedding_u_size=64,
    edge_types=graph.etypes,
    edge_type_count=len(graph.etypes),
    dim_a=16,
)

task = HeteroSampleEmbedGATNE(graph, gatne)
task.train_and_embed(lr=5e-4, n_epochs=50, gpu=0)

# Extract learned embeddings

embeddings = graph.ndata["feat"]  # shape: (num_nodes, 128)

```

### Link Prediction with SEAL

```python
import torch
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.seal import DGCNN, data_prepare
from hugegraph_ml.tasks.link_prediction_seal import LinkPredictionSeal

hg2d = HugeGraph2DGL()
graph, split_edge = hg2d.convert_graph_ogb(
    vertex_label="ogbl-collab_vertex",
    edge_label="ogbl-collab_edge",
    split_label="ogbl-collab_split_edge",
)

node_attr, edge_weight = data_prepare(graph, split_edge)

model = DGCNN(
    num_layers=3,
    hidden_units=32,
    k=30,
    gcn_type="gcn",
    node_attributes=node_attr,
    edge_weights=edge_weight,
    use_embedding=True,
    num_nodes=graph.num_nodes(),
    dropout=0.5,
)

task = LinkPredictionSeal(graph, split_edge, model)
task.train(lr=5e-3, n_epochs=200, gpu=0)

# Hits@50 and other metrics computed automatically during training

```

### Graph-Level Classification

```python
from hugegraph_ml.data.hugegraph_dataset import HugeGraphDataset
from hugegraph_ml.models.gnn import GCN
from hugegraph_ml.tasks.graph_classify import GraphClassify

dataset = HugeGraphDataset(root="data/ogbg_molhiv")
model = GCN(
    in_dim=dataset.num_node_features, 
    hidden_dim=128, 
    out_dim=dataset.num_classes
)

task = GraphClassify(dataset, model)
task.train(batch_size=32, n_epochs=150, gpu=0)

test_metrics = task.evaluate()
print(test_metrics)  # {'accuracy': 0.78, 'loss': 0.31}

```

## Summary

- **Nine Core Tasks**: HugeGraph-ML implements `NodeEmbed`, `NodeClassify`, `NodeClassifyWithSample`, `NodeClassifyWithEdge`, `GraphClassify`, `LinkPredictionSeal`, `LinkPredictionPGNN`, `HeteroSampleEmbedGATNE`, and `DetectorCaregnn` to cover node, link, and graph-level graph machine learning.
- **Consistent API**: All tasks in `hugegraph_ml/tasks/` expose uniform training methods and handle device placement, early stopping, and logging internally.
- **DGL Foundation**: Built on Deep Graph Library, the library supports both full-graph and sampled training modes for scalability.
- **Heterogeneous Support**: Specialized tasks like GATNE embedding and CARE-GNN fraud detection handle complex multi-relational graphs and imbalanced classification scenarios.

## Frequently Asked Questions

### What is the difference between NodeClassify and NodeClassifyWithSample?

**`NodeClassify`** performs full-graph training, loading the entire graph into GPU memory for each epoch, which is efficient for smaller datasets. **`NodeClassifyWithSample`**, implemented in [`hugegraph_ml/tasks/node_classify_with_sample.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/tasks/node_classify_with_sample.py), uses DGL's `ClusterGCNSampler` to train on sampled node clusters, enabling node classification on billion-edge graphs that exceed single-GPU memory constraints.

### How does HugeGraph-ML handle GPU acceleration?

Each task class automatically configures `self._device` based on the `gpu` parameter passed to `train()` methods. The implementation calls `.to(self._device)` on both the model and the DGL graph object, ensuring tensors reside on the specified CUDA device without manual tensor relocation code.

### Can I use custom models with the task classes?

Yes, provided your model implements the three required methods: `forward()` for computing node/graph representations, `inference()` for prediction without gradient computation, and `loss()` for calculating the optimization objective. The `MLPClassifier` in [`hugegraph_ml/models/mlp.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/models/mlp.py) demonstrates this interface for reference.

### What data formats does HugeGraph-ML support?

The library primarily converts HugeGraph database instances to DGL graphs via [`hugegraph_ml/data/hugegraph2dgl.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/data/hugegraph2dgl.py), supporting both property graphs and OGB (Open Graph Benchmark) formatted datasets with train/validation/test splits. For graph-level tasks, [`hugegraph_ml/data/hugegraph_dataset.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_ml/data/hugegraph_dataset.py) provides PyTorch Dataset-compatible wrappers.