How to Use Text2Gremlin in HugeGraph AI to Convert Natural Language to Gremlin

Text2Gremlin in HugeGraph AI is a production-ready pipeline that automatically transforms natural language questions into executable Gremlin queries by orchestrating schema retrieval, few-shot example matching, LLM-based code generation, and database execution.

The apache/incubator-hugegraph-ai repository provides Text2Gremlin as an integrated solution within the hugegraph-llm module. This capability allows developers to query HugeGraph databases using plain English instead of manual Gremlin syntax, leveraging a four-stage pipeline that handles everything from context retrieval to result execution.

Understanding the Text2Gremlin Pipeline Architecture

The pipeline consists of four specialized nodes orchestrated by Text2GremlinFlow in hugegraph-llm/src/hugegraph_llm/flows/text2gremlin.py. Each node handles a specific transformation step:

SchemaNode

The SchemaNode (hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/schema.py) loads the graph schema definition, including vertex labels, edge labels, and property keys. It makes this structural metadata available to downstream nodes so the LLM understands the graph topology before generating queries.

GremlinExampleIndexQueryNode

The GremlinExampleIndexQueryNode (hugegraph-llm/src/hugegraph_llm/nodes/index_node/gremlin_example_index_query.py) retrieves relevant few-shot examples from an indexed corpus. It matches the user's natural language question against stored query-Gremlin pairs, providing context that improves the LLM's accuracy for similar query patterns.

Text2GremlinNode

The Text2GremlinNode (hugegraph-llm/src/hugegraph_llm/nodes/llm_node/text2gremlin.py) serves as the LLM interface. It initializes the GremlinGenerateSynthesize operator from hugegraph-llm/src/hugegraph_llm/operators/llm_op/gremlin_generate.py, which constructs prompts using the schema, retrieved examples, and user query. The operator runs parallel prompts (raw and matched examples), extracts the Gremlin code block from the LLM response, and stores both raw and processed results in the pipeline state.

GremlinExecuteNode

The GremlinExecuteNode (hugegraph-llm/src/hugegraph_llm/nodes/hugegraph_node/gremlin_execute.py) sends the generated Gremlin statement to the HugeGraph server, executes it, and returns the results to complete the flow.

Pipeline Orchestration

The Text2GremlinFlow class wires these nodes together using a GPipeline graph structure:

pipeline.registerGElement(schema_node, set(), "schema_node")
pipeline.registerGElement(ieq_node, set(), "gremlin_example_index_query")
pipeline.registerGElement(tgn_node, {schema_node, ieq_node}, "text2gremlin")
pipeline.registerGElement(exe_node, {tgn_node}, "gremlin_execute")

Dependencies ensure text2gremlin waits for both schema loading and example retrieval, while gremlin_execute waits for code generation.

Running Text2Gremlin in Python

To use Text2Gremlin programmatically, instantiate Text2GremlinFlow and invoke build_flow() followed by run():

from hugegraph_llm.flows.text2gremlin import Text2GremlinFlow

# Initialize the pipeline

flow = Text2GremlinFlow()

# Configure the query parameters

query = "Show the names of all persons who are friends of 'Alice'"
schema = ""               # Optional: override auto-loaded schema

gremlin_prompt = None     # Optional: custom prompt template

example_num = 3           # Number of few-shot examples (0-10)

# Execute the pipeline

result = flow.build_flow(
    query=query,
    example_num=example_num,
    schema_input=schema,
    gremlin_prompt_input=gremlin_prompt,
).run()

# Access the generated query and results

print(result["template_gremlin"])          # Generated Gremlin code

print(result["template_execution_result"]) # Database output

The result dictionary always contains five standardized keys: match_result, template_gremlin, raw_gremlin, template_execution_result, and raw_execution_result.

Deploying via REST API

For service-oriented architectures, the repository exposes Text2Gremlin through hugegraph_llm/api/rag_api.py. Send a POST request to the /api/v1/text2gremlin endpoint:

curl -X POST https://your-hg-llm-service/api/v1/text2gremlin \
     -H "Content-Type: application/json" \
     -d '{
           "query": "How many computers are in the network?",
           "example_num": 2,
           "schema": "",
           "gremlin_prompt": null
         }'

The API internally constructs a Text2GremlinFlow instance and returns the same JSON structure as the Python interface, making it suitable for frontend applications or microservices.

Customizing Prompts and Schemas

You can override the default behavior by providing custom schemas or prompts. The default prompt lives in hugegraph_llm/config/prompt_config.py as gremlin_generate_prompt_EN, which instructs the LLM to detect complex multi-step queries and return g.V().limit(0) as a safety placeholder for unsupported logic.

To inject a custom graph model:

custom_schema = """
{
  "vertices": [
    {"vertex_label": "person", "properties": ["name", "age"]},
    {"vertex_label": "computer", "properties": ["hostname", "ip"]}
  ],
  "edges": [
    {"edge_label": "owns", "source_vertex_label": "person", "target_vertex_label": "computer"}
  ]
}
"""

result = flow.build_flow(
    query="List all computers owned by people older than 30",
    example_num=2,
    schema_input=custom_schema,
    gremlin_prompt_input=None,
).run()

The schema string replaces the {schema} placeholder in the prompt template, allowing the LLM to generate accurate property keys and edge labels for your specific data model.

Core Implementation Files

The following source files define the Text2Gremlin behavior:

Summary

  • Text2Gremlin combines four specialized nodes—schema loading, example retrieval, LLM generation, and query execution—into a unified pipeline.
  • The Text2GremlinFlow class in flows/text2gremlin.py manages dependencies and output normalization.
  • You can invoke the pipeline via direct Python API or REST endpoint, with full control over example counts, custom schemas, and prompt templates.
  • Generated results include both the final Gremlin code (template_gremlin) and raw LLM outputs (raw_gremlin) for debugging and auditing.
  • The system includes safety mechanisms to return empty traversals for queries detected as too complex for reliable generation.

Frequently Asked Questions

What is Text2Gremlin in HugeGraph AI?

Text2Gremlin in HugeGraph AI is an automated pipeline that converts natural language questions into executable Gremlin graph traversal statements. It uses a series of processing nodes to fetch graph schemas, retrieve relevant query examples, prompt an LLM for code generation, and execute the resulting Gremlin against a HugeGraph database.

How does Text2Gremlin handle complex queries?

According to the default prompt configuration in prompt_config.py, the LLM analyzes whether a query requires multiple reasoning steps, conditional logic, or nested traversals. If the query is deemed too complex, the system returns g.V().limit(0) as a safe placeholder rather than generating potentially incorrect Gremlin syntax.

Can I use custom graph schemas with Text2Gremlin?

Yes. You can pass a custom schema JSON string via the schema_input parameter in build_flow() or the schema field in the REST API. This overrides the automatic schema retrieval from the HugeGraph server and injects your custom vertex and edge definitions directly into the LLM prompt.

Which LLM models are compatible with Text2Gremlin?

The system uses the LLM factory in hugegraph-llm/src/hugegraph_llm/models/llms/init_llm.py (specifically get_text2gql_llm()) to initialize the language model. As implemented in apache/incubator-hugegraph-ai, it supports any LLM provider configured in the environment, including OpenAI GPT models, local models via vLLM, or other compatible APIs, as long as they are registered in the LLM initialization module.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →