# How to Use Text2Gremlin in HugeGraph AI to Convert Natural Language to Gremlin

> Effortlessly convert natural language to Gremlin queries with Text2Gremlin in HugeGraph AI. Discover this powerful pipeline for seamless data interaction and query generation.

- Repository: [The Apache Software Foundation/incubator-hugegraph-ai](https://github.com/apache/incubator-hugegraph-ai)
- Tags: how-to-guide
- Published: 2026-02-24

---

**Text2Gremlin in HugeGraph AI is a production-ready pipeline that automatically transforms natural language questions into executable Gremlin queries by orchestrating schema retrieval, few-shot example matching, LLM-based code generation, and database execution.**

The apache/incubator-hugegraph-ai repository provides **Text2Gremlin** as an integrated solution within the `hugegraph-llm` module. This capability allows developers to query HugeGraph databases using plain English instead of manual Gremlin syntax, leveraging a four-stage pipeline that handles everything from context retrieval to result execution.

## Understanding the Text2Gremlin Pipeline Architecture

The pipeline consists of four specialized nodes orchestrated by `Text2GremlinFlow` in [`hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py). Each node handles a specific transformation step:

### SchemaNode

The **SchemaNode** ([`hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py)) loads the graph schema definition, including vertex labels, edge labels, and property keys. It makes this structural metadata available to downstream nodes so the LLM understands the graph topology before generating queries.

### GremlinExampleIndexQueryNode

The **GremlinExampleIndexQueryNode** ([`hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py)) retrieves relevant few-shot examples from an indexed corpus. It matches the user's natural language question against stored query-Gremlin pairs, providing context that improves the LLM's accuracy for similar query patterns.

### Text2GremlinNode

The **Text2GremlinNode** ([`hugegraph-llm/src/hugegraph_llm/nodes/llm_node/text2gremlin.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/llm_node/text2gremlin.py)) serves as the LLM interface. It initializes the `GremlinGenerateSynthesize` operator from [`hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py), which constructs prompts using the schema, retrieved examples, and user query. The operator runs parallel prompts (raw and matched examples), extracts the Gremlin code block from the LLM response, and stores both raw and processed results in the pipeline state.

### GremlinExecuteNode

The **GremlinExecuteNode** ([`hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py)) sends the generated Gremlin statement to the HugeGraph server, executes it, and returns the results to complete the flow.

### Pipeline Orchestration

The `Text2GremlinFlow` class wires these nodes together using a `GPipeline` graph structure:

```python
pipeline.registerGElement(schema_node, set(), "schema_node")
pipeline.registerGElement(ieq_node, set(), "gremlin_example_index_query")
pipeline.registerGElement(tgn_node, {schema_node, ieq_node}, "text2gremlin")
pipeline.registerGElement(exe_node, {tgn_node}, "gremlin_execute")

```

Dependencies ensure `text2gremlin` waits for both schema loading and example retrieval, while `gremlin_execute` waits for code generation.

## Running Text2Gremlin in Python

To use **Text2Gremlin** programmatically, instantiate `Text2GremlinFlow` and invoke `build_flow()` followed by `run()`:

```python
from hugegraph_llm.flows.text2gremlin import Text2GremlinFlow

# Initialize the pipeline

flow = Text2GremlinFlow()

# Configure the query parameters

query = "Show the names of all persons who are friends of 'Alice'"
schema = ""               # Optional: override auto-loaded schema

gremlin_prompt = None     # Optional: custom prompt template

example_num = 3           # Number of few-shot examples (0-10)

# Execute the pipeline

result = flow.build_flow(
    query=query,
    example_num=example_num,
    schema_input=schema,
    gremlin_prompt_input=gremlin_prompt,
).run()

# Access the generated query and results

print(result["template_gremlin"])          # Generated Gremlin code

print(result["template_execution_result"]) # Database output

```

The `result` dictionary always contains five standardized keys: `match_result`, `template_gremlin`, `raw_gremlin`, `template_execution_result`, and `raw_execution_result`.

## Deploying via REST API

For service-oriented architectures, the repository exposes **Text2Gremlin** through [`hugegraph_llm/api/rag_api.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_llm/api/rag_api.py). Send a POST request to the `/api/v1/text2gremlin` endpoint:

```bash
curl -X POST https://your-hg-llm-service/api/v1/text2gremlin \
     -H "Content-Type: application/json" \
     -d '{
           "query": "How many computers are in the network?",
           "example_num": 2,
           "schema": "",
           "gremlin_prompt": null
         }'

```

The API internally constructs a `Text2GremlinFlow` instance and returns the same JSON structure as the Python interface, making it suitable for frontend applications or microservices.

## Customizing Prompts and Schemas

You can override the default behavior by providing custom schemas or prompts. The default prompt lives in [`hugegraph_llm/config/prompt_config.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph_llm/config/prompt_config.py) as `gremlin_generate_prompt_EN`, which instructs the LLM to detect complex multi-step queries and return `g.V().limit(0)` as a safety placeholder for unsupported logic.

To inject a custom graph model:

```python
custom_schema = """
{
  "vertices": [
    {"vertex_label": "person", "properties": ["name", "age"]},
    {"vertex_label": "computer", "properties": ["hostname", "ip"]}
  ],
  "edges": [
    {"edge_label": "owns", "source_vertex_label": "person", "target_vertex_label": "computer"}
  ]
}
"""

result = flow.build_flow(
    query="List all computers owned by people older than 30",
    example_num=2,
    schema_input=custom_schema,
    gremlin_prompt_input=None,
).run()

```

The schema string replaces the `{schema}` placeholder in the prompt template, allowing the LLM to generate accurate property keys and edge labels for your specific data model.

## Core Implementation Files

The following source files define the **Text2Gremlin** behavior:

- **[`hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py)** – Orchestrates the four-node pipeline and post-processes results via `post_deal()`.
- **[`hugegraph-llm/src/hugegraph_llm/nodes/llm_node/text2gremlin.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/llm_node/text2gremlin.py)** – Node implementation that prepares LLM inputs and invokes the synthesize operator.
- **[`hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py)** – Core operator handling prompt formatting, LLM invocation (sync/async), and Gremlin extraction.
- **[`hugegraph-llm/src/hugegraph_llm/config/prompt_config.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/config/prompt_config.py)** – Contains `gremlin_generate_prompt_EN` and `gremlin_generate_prompt_CN` templates.
- **[`hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py)** – Schema retrieval and formatting logic.
- **[`hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py)** – Vector/keyword search for few-shot examples.
- **[`hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py)** – Query execution against the HugeGraph server.

## Summary

- **Text2Gremlin** combines four specialized nodes—schema loading, example retrieval, LLM generation, and query execution—into a unified pipeline.
- The `Text2GremlinFlow` class in [`flows/text2gremlin.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/flows/text2gremlin.py) manages dependencies and output normalization.
- You can invoke the pipeline via direct Python API or REST endpoint, with full control over example counts, custom schemas, and prompt templates.
- Generated results include both the final Gremlin code (`template_gremlin`) and raw LLM outputs (`raw_gremlin`) for debugging and auditing.
- The system includes safety mechanisms to return empty traversals for queries detected as too complex for reliable generation.

## Frequently Asked Questions

### What is Text2Gremlin in HugeGraph AI?

**Text2Gremlin in HugeGraph AI** is an automated pipeline that converts natural language questions into executable Gremlin graph traversal statements. It uses a series of processing nodes to fetch graph schemas, retrieve relevant query examples, prompt an LLM for code generation, and execute the resulting Gremlin against a HugeGraph database.

### How does Text2Gremlin handle complex queries?

According to the default prompt configuration in [`prompt_config.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/prompt_config.py), the LLM analyzes whether a query requires multiple reasoning steps, conditional logic, or nested traversals. If the query is deemed too complex, the system returns `g.V().limit(0)` as a safe placeholder rather than generating potentially incorrect Gremlin syntax.

### Can I use custom graph schemas with Text2Gremlin?

Yes. You can pass a custom schema JSON string via the `schema_input` parameter in `build_flow()` or the `schema` field in the REST API. This overrides the automatic schema retrieval from the HugeGraph server and injects your custom vertex and edge definitions directly into the LLM prompt.

### Which LLM models are compatible with Text2Gremlin?

The system uses the LLM factory in [`hugegraph-llm/src/hugegraph_llm/models/llms/init_llm.py`](https://github.com/apache/incubator-hugegraph-ai/blob/main/hugegraph-llm/src/hugegraph_llm/models/llms/init_llm.py) (specifically `get_text2gql_llm()`) to initialize the language model. As implemented in apache/incubator-hugegraph-ai, it supports any LLM provider configured in the environment, including OpenAI GPT models, local models via vLLM, or other compatible APIs, as long as they are registered in the LLM initialization module.