How Cross-Language Query Support Works in RAGFlow: A Technical Deep Dive

RAGFlow enables cross-language query support by automatically translating user questions into multiple target languages using LLM-driven prompt templates before executing vector search, allowing seamless retrieval across multilingual document collections.

Cross-language query support bridges the gap between user questions and document corpora indexed in different languages. In RAGFlow, this capability is implemented through a sophisticated translation layer that integrates directly into the retrieval pipeline using prompt-driven LLM calls. This article examines the exact implementation details, from API parameters to prompt templates, based on the current RAGFlow source code.

Architecture Overview

The cross-language retrieval workflow in RAGFlow follows a translation-then-retrieve pattern. When a user submits a query with the cross_languages parameter enabled, the system:

  1. Accepts the original query and target language list via API or SDK
  2. Invokes the cross_languages helper function in rag/prompts/generator.py to translate the query into each specified language
  3. Executes vector searches using the translated queries
  4. Aggregates results across all language variants while preserving relevance ranking

This design ensures that documents indexed in English, Chinese, Spanish, or any other supported language remain accessible regardless of the query language.

The Translation Engine: Prompt-Driven LLM Calls

At the core of RAGFlow's cross-language support is the cross_languages function located at line 258 in rag/prompts/generator.py. This function orchestrates query translation by constructing specific LLM prompts.

The implementation loads two distinct prompt templates:

These templates reside in rag/prompts/templates/ and are loaded at runtime using the load_prompt utility. The function then invokes the configured LLM (identified by llm_id) to generate translations for each language specified in the cross_languages parameter.

API and SDK Integration

The cross_languages parameter propagates through multiple entry points in the RAGFlow architecture, ensuring consistent behavior across interfaces.

HTTP API Endpoints

The retrieval endpoints in api/apps/sdk/session.py (lines 1083-1135) and api/apps/sdk/doc.py extract the cross_languages field from incoming JSON payloads. When present, this list of language codes (e.g., ["en", "zh", "es"]) is forwarded to the retrieval engine.

According to the API reference documentation in docs/references/http_api_reference.md (lines 2385-2452), the parameter accepts an array of ISO language codes that determine which translations the system generates before searching.

Python SDK

The Python SDK implementation in sdk/python/ragflow_sdk/ragflow.py (lines 203-221) exposes the cross_languages parameter in the retrieve() method. Users pass a list of language strings, which the SDK serializes into the HTTP request body.

The SDK reference in docs/references/python_api_reference.md (lines 1001-1051) documents this parameter as enabling "multilingual expansion" of the input query.

Agent Tools

Within the agent framework, the retrieval tool in agent/tools/retrieval.py (lines 172-173) utilizes the same cross-language helper when executing retrieval actions. This ensures that agent-based workflows maintain parity with direct API calls.

Configuration and Prompt Templates

The translation quality depends on the prompt templates stored in rag/prompts/templates/. The system uses two specific files:

  • cross_languages_sys_prompt.txt: Establishes the LLM's role as a professional translator, instructing it to preserve semantic meaning while converting the query into the specified target language.
  • cross_languages_user_prompt.txt: Provides the template structure: Translate the following query into {language}: {query}

These templates are loaded by the load_prompt function and combined with the runtime parameters (original query and target language code) before being sent to the LLM. This modular design allows operators to customize translation behavior by editing the prompt files without modifying the core Python code in rag/prompts/generator.py.

Practical Usage Examples

Basic Cross-Language Retrieval via Python SDK

from ragflow_sdk import RAGFlow

# Initialize the client

client = RAGFlow(
    base_url="https://your-ragflow-instance.com",
    api_key="YOUR_API_KEY"
)

# Search across English, Chinese, and Spanish indices

results = client.retrieve(
    question="How does neural machine translation work?",
    dataset_ids=[456],
    cross_languages=["zh", "es"]  # Translate to Chinese and Spanish

)

for chunk in results:
    print(f"[{chunk.language}] {chunk.content[:100]}...")

The SDK packs the cross_languages list into the JSON payload sent to /api/v1/retrieve. The backend translates the query into Chinese and Spanish via the LLM, runs separate searches, and returns a unified ranked list.

Direct HTTP API Call

curl -X POST https://your-ragflow-instance.com/api/v1/retrieve \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Quel est le processus de création d’index ?",
    "dataset_ids": [789],
    "cross_languages": ["en", "de"]
  }'

The API returns a JSON array of matching chunks from all three language-specific queries, merged and ranked by relevance scores.

Summary

  • Cross-language query support in RAGFlow works by translating user queries into multiple target languages before retrieval using LLM-driven prompt templates.
  • The cross_languages parameter is accepted across the HTTP API (/api/v1/retrieve), Python SDK (ragflow_sdk.RAGFlow.retrieve), and agent tools (agent/tools/retrieval.py).
  • Translation logic resides in rag/prompts/generator.py (lines 258-285), which loads templates from cross_languages_sys_prompt.txt and cross_languages_user_prompt.txt.
  • The system supports any LLM configured in RAGFlow (OpenAI, Anthropic, etc.), making it model-agnostic.
  • Results from all language-specific queries are merged and ranked before being returned to the client.

Frequently Asked Questions

How does RAGFlow handle translation quality for technical terminology?

RAGFlow delegates translation to the configured LLM (specified by llm_id) using carefully crafted prompts in cross_languages_sys_prompt.txt and cross_languages_user_prompt.txt. Because it uses the same high-capability models that power the chat functionality (such as GPT-4 or Claude), technical terminology is preserved accurately. Operators can further refine translation quality by customizing the prompt templates without modifying the core Python code in rag/prompts/generator.py.

Can I use cross-language retrieval with the RAGFlow agent framework?

Yes. The agent retrieval tool in agent/tools/retrieval.py (lines 172-173) invokes the same cross_languages helper function used by the direct API. When building an agent workflow, you can configure the retrieval action to include the cross_languages parameter, ensuring that agent-based searches also benefit from multilingual query expansion.

What languages are supported for cross-language queries?

RAGFlow's cross-language support is theoretically unlimited because it relies on the underlying LLM's translation capabilities rather than hardcoded language pairs. You can pass any ISO language code (such as "en", "zh", "es", "de", "ja") to the cross_languages parameter. The actual supported set depends on the translation quality of your configured LLM for those specific language pairs.

Is there a performance penalty when using cross-language retrieval?

Yes, there is a latency cost proportional to the number of languages specified. The system must perform an LLM call per target language in rag/prompts/generator.py to generate translations, followed by multiple vector searches. However, the translations can be cached, and the searches execute in parallel, mitigating the overhead for production workloads. The merged results preserve the relevance ranking across all language variants.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →