deep-dive

How RAGFlow Implements Grounded Citations for Retrieved Information: A Technical Deep Dive

February 23, 2026 infiniflow/ragflow ↗

RAGFlow implements grounded citations through a three-stage pipeline that combines static prompt templates, lexical chunk matching via insert_citations(), and LLM-driven citation enrichment to ensure every answer is traceable to its source documents.

Grounded citations transform opaque AI responses into verifiable, trustworthy outputs by anchoring every claim to its original source. In the infiniflow/ragflow open-source repository, the implementation of grounded citations for retrieved information relies on a tightly coupled architecture that binds retrieval results to generated answers through explicit source markers and LLM formatting instructions.

The Three-Stage Citation Pipeline

RAGFlow’s citation system operates as a sequential pipeline that bridges retrieval and generation. The process guarantees that every sentence in the final answer can be mapped back to specific document chunks or knowledge graph entities.

Stage 1: Citation Prompt Definition

The foundation of RAGFlow’s citation system rests on static prompt templates that enforce strict formatting rules. The files rag/prompts/citation_prompt.md and rag/prompts/citation_plus.md define the exact citation format the LLM must follow, including instructions to place citations at the end of sentences and limit them to a maximum of four per sentence.

These templates are loaded by the prompt generator in rag/prompts/generator.py and exposed through two helper functions: citation_prompt() and citation_plus() (lines 159-191). The citation_prompt() function retrieves the base style guide, while citation_plus() loads the instruction set that asks the LLM to replace placeholder markers with properly formatted citations.

Stage 2: Retrieval and Chunk Matching

When a user query enters the system, the Retriever class—implemented in rag/nlp/search.py—returns the most relevant text chunks alongside their vector similarity scores. After the LLM generates a raw answer, the method insert_citations() (lines 177-190) processes the text sentence-by-sentence.

This method performs lexical overlap analysis to match each sentence against the retrieved chunks. When a match is detected, insert_citations() injects a citation marker directly into the text. The system uses two distinct marker types:

[DC] markers for document chunks, formatted as [DC] <file_path>
[KG] markers for knowledge graph entities, formatted as [KG] <entity_id>

These markers serve as temporary placeholders that survive the initial generation phase but require further processing to become human-readable citations.

Stage 3: LLM-Driven Citation Enrichment

The raw answer containing placeholder markers is not returned directly to the user. Instead, the system performs a second LLM invocation for citation formatting. This step is orchestrated in api/db/services/dialog_service.py (lines 607-631), where the service concatenates the system prompt with the output of citation_prompt() and streams the response through the LLM.

During this phase, the citation_plus prompt instructs the model to replace all [DC] and [KG] placeholders with numbered citations (e.g., [1], [2]) and append a "References" section listing the concrete sources. The final streamed response contains the fully formatted answer with clickable references to supporting documents, as documented in the project’s README (lines 121-124).

Implementation Details and Code Example

The following Python pattern illustrates the complete citation workflow as implemented in the RAGFlow source code:

from rag.prompts.generator import citation_prompt, citation_plus

# 1️⃣ Load the citation prompt (used as system prompt)

system_prompt = "You are a helpful assistant.\n"
system_prompt += citation_prompt()               # adds citation guidelines

# 2️⃣ Retrieve relevant chunks (simplified)

retriever = MyRetriever(...)
chunks, vectors = retriever.search(query)

# 3️⃣ Generate a raw answer (LLM call omitted for brevity)

raw_answer = llm.generate(system_prompt, query)

# 4️⃣ Insert placeholder citations based on chunk overlap

answer_with_marks, idx = retriever.insert_citations(
    raw_answer, chunks, vectors
)

# 5️⃣ Ask the LLM to replace placeholders with formatted citations

final_answer = llm.generate(
    citation_plus("\n".join(answer_with_marks)),   # citation_plus adds the "add citations" prompt

    answer_with_marks
)

print(final_answer)   # → answer + “[1] …” and a “References” list at the end

Key Source Files and Their Roles

Understanding the grounded citations implementation requires familiarity with these specific components:

rag/prompts/citation_prompt.md – Defines the base citation style guide that constrains how the LLM formats source references.
rag/prompts/citation_plus.md – Contains the instruction template that explicitly asks the LLM to replace placeholder markers with final citation numbers and generate the reference list.
rag/prompts/generator.py – Houses the citation_prompt() and citation_plus() functions (lines 159-191) that load markdown templates and expose them to the retrieval pipeline.
rag/nlp/search.py – Implements the insert_citations() method (lines 177-190) that performs lexical matching between answer sentences and retrieved chunks to inject [DC] or [KG] markers.
api/db/services/dialog_service.py – Orchestrates the end-to-end citation injection during chat sessions (lines 607-631), managing the transition from raw generation to citation-enriched output.
README.md – Documents the user-facing benefits of traceable citations and the "quick view" functionality for source documents (lines 121-124).

Summary

RAGFlow’s approach to grounded citations ensures verifiable, hallucination-resistant AI responses through a rigorous three-step process:

Centralized format control via version-controlled markdown templates that standardize citation appearance across all outputs.
Lexical chunk matching through insert_citations(), which binds specific sentences to their supporting evidence using [DC] and [KG] markers.
Explicit LLM formatting using the citation_plus prompt to convert technical placeholders into readable reference numbers and bibliography entries.

This architecture guarantees that every generated answer includes a traceable provenance, dramatically increasing user trust and enabling direct verification against source materials.

Frequently Asked Questions

What is the difference between citation_prompt.md and citation_plus.md?

citation_prompt.md defines the static style guidelines that constrain how citations should appear in the final output, such as placement rules and maximum counts per sentence. citation_plus.md contains the active instruction set that tells the LLM to perform the replacement operation—converting placeholder markers like [DC] into numbered citations [1] and generating the "References" section. The former is used during initial generation; the latter drives the enrichment phase.

How does RAGFlow match answer sentences to specific chunks?

The system uses the insert_citations() method in rag/nlp/search.py (lines 177-190) to perform lexical overlap analysis. This function iterates through the raw answer sentence-by-sentence, compares the text against the content of retrieved chunks, and injects [DC] or [KG] markers where semantic similarity indicates a supporting relationship. This approach ensures citations align with the actual content of the retrieved documents rather than being hallucinated by the LLM.

What do the [DC] and [KG] citation markers represent?

[DC] stands for "Document Chunk" and is formatted as [DC] <file_path>, linking the citation to a specific segment of an uploaded document. [KG] stands for "Knowledge Graph" and appears as [KG] <entity_id>, connecting the citation to structured entities within the system’s knowledge graph. These markers serve as temporary placeholders during the generation pipeline before the LLM converts them into sequential numbered citations.

Which component orchestrates the final citation formatting?

The dialog_service.py file in api/db/services/ (lines 607-631) manages the orchestration. This service handles the conversation state, concatenates the system prompt with citation instructions, and invokes the LLM stream twice: once for the initial answer generation and again for the citation enrichment phase using the citation_plus() prompt template. This centralization ensures consistent citation handling across all chat interactions in the RAGFlow system.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how infiniflow/ragflow works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →