RAGFlow Agent Tools: Complete Catalog and Extension Guide

RAGFlow ships approximately 20 built-in agent tools in agent/tools/—including Wikipedia, Google Search, GitHub, and ArXiv—each implemented as a *Param metadata class and a * execution class, with new tools auto-discovered simply by adding a Python file to the directory.

The RAGFlow repository (infiniflow/ragflow) provides a modular agent framework where capabilities are encapsulated as reusable tools. Located in agent/tools/, these components follow a strict dual-class architecture that separates JSON schema definition from runtime execution, enabling seamless LLM function calling without manual registration overhead.

Built-in Tools in agent/tools/

The RAGFlow agent toolkit includes search engines, academic databases, financial data sources, and utility executors. Every tool follows the pattern of a *Param class (defining metadata) and a * class (implementing logic).

Tool File Param Class Tool Class
Wikipedia wikipedia.py WikipediaParam Wikipedia
Google Search google.py GoogleParam Google
DuckDuckGo duckduckgo.py DuckDuckGoParam DuckDuckGo
GitHub github.py GitHubParam GitHub
ArXiv arxiv.py ArXivParam ArXiv
PubMed pubmed.py PubMedParam PubMed
Google Scholar googlescholar.py GoogleScholarParam GoogleScholar
Tavily Search tavily.py TavilySearchParam TavilySearch
Tavily Extract tavily.py TavilyExtractParam TavilyExtract
SearXNG searxng.py SearXNGParam SearXNG
Web Crawler crawler.py CrawlerParam Crawler
Code Execution code_exec.py CodeExecParam CodeExec
Email email.py EmailParam Email
DeepL Translation deepl.py DeepLParam DeepL
Yahoo Finance yahoofinance.py YahooFinanceParam YahooFinance
AkShare akshare.py AkShareParam AkShare
TuShare tushare.py TuShareParam TuShare
QWeather qweather.py QWeatherParam QWeather
Jin10 jin10.py Jin10Param Jin10
WenCai wencai.py WenCaiParam WenCai
Retrieval retrieval.py RetrievalParam Retrieval
ExSQL exesql.py ExeSQLParam ExeSQL

The Retrieval tool provides internal RAG document search, while ExSQL enables SQL execution against connected databases. Financial data tools like AkShare, TuShare, and Yahoo Finance target quantitative analysis workflows.

Tool Architecture and Base Classes

All RAGFlow agent tools inherit from two abstract base classes defined in agent/tools/base.py:

  • ToolParamBase – Declares the tool’s OpenAI-compatible function schema, including name, description, and parameters JSON schema. It also handles default values and input validation via an optional check() method.
  • ToolBase – Provides the runtime environment, exposing _invoke() for synchronous execution or _invoke_async() for asynchronous operations, plus canvas interaction methods like set_output() and _retrieve_chunks().

The module agent/tools/__init__.py implements automatic discovery. It walks the agent/tools/ directory, imports every Python module (excluding base.py and itself), and registers all public classes in __all__. This means adding a new tool requires zero configuration changes to the registry.

When the LLM decides to invoke a tool, LLMToolPluginCallSession.tool_call_async retrieves the concrete tool instance from the global tools_map, executes its invoke method, and records the result back to the agent canvas for downstream components to reference.

How to Extend RAGFlow with Custom Tools

Creating a new RAGFlow agent tool involves defining two classes in a new file within agent/tools/. The framework handles registration automatically.

Step 1: Create the Parameter Class

Define a class inheriting from ToolParamBase that specifies the tool’s metadata in a self.meta dictionary. This schema dictates how the LLM constructs function calls.

from agent.tools.base import ToolParamBase, ToolMeta
from abc import ABC

class GreetParam(ToolParamBase):
    def __init__(self):
        self.meta: ToolMeta = {
            "name": "greet",
            "description": "Generate a friendly greeting for a specified entity.",
            "parameters": {
                "who": {
                    "type": "string",
                    "description": "Person or entity to greet",
                    "default": "world",
                    "required": True,
                }
            },
        }
        super().__init__()

The meta dictionary must include name (the function identifier), description (visible to the LLM), and parameters (JSON Schema properties). Use {sys.user} or other template strings in defaults to reference runtime context variables.

Step 2: Implement the Execution Class

Create a class inheriting from ToolBase (and optionally ABC) that implements _invoke(). Use set_output() to push results back to the agent canvas so downstream nodes can reference them.

from agent.tools.base import ToolBase

class Greet(ToolBase, ABC):
    component_name = "Greet"

    def _invoke(self, **kwargs):
        who = kwargs.get("who", "world")
        message = f"👋 Hello, {who}!"
        
        # Store outputs for canvas reference

        self.set_output("message", message)
        self.set_output("formalized_content", message)
        return message

Step 3: Verify Auto-Discovery

Save both classes in agent/tools/greet.py. Because agent/tools/__init__.py automatically imports all modules and exposes their classes, the Greet and GreetParam classes are immediately available in the global namespace. Restart the agent process to load the new module.

Async Implementation Pattern

For tools requiring network I/O or heavy computation, implement _invoke_async() instead:

import aiohttp

class AsyncWeather(ToolBase, ABC):
    component_name = "AsyncWeather"
    
    async def _invoke_async(self, **kwargs):
        city = kwargs.get("city")
        async with aiohttp.ClientSession() as session:
            async with session.get(f"https://api.weather.com/v1/{city}") as resp:
                data = await resp.json()
                self.set_output("temperature", data["temp"])
                return data

The base class invoke_async wrapper will route calls to this method when running in async contexts.

Key Implementation Details for Extension

When building production-grade RAGFlow tools, reference these critical source locations:

  • agent/tools/base.py – Contains ToolBase._retrieve_chunks() for RAG integration and ToolParamBase.generate() for schema serialization.
  • agent/component/base.py – Provides check_if_canceled() for long-running operations and get_component_name() for logging.
  • agent/canvas.py – Manages the execution graph; tools write references here via add_reference() so the agent can cite sources.
  • common/mcp_tool_call_conn.py – Implements MCPToolCallSession for sandboxing tools in separate processes; use this pattern for untrusted code execution.

Implement the check() method in your *Param class to validate configuration before execution:

def check(self):
    if not self.meta.get("parameters"):
        raise ValueError("Parameters schema is required")
    return True

Summary

  • RAGFlow provides 20+ ready-to-use agent tools in agent/tools/, covering search, finance, translation, and code execution.
  • Each tool requires two classes: a *Param class (inheriting ToolParamBase) for schema definition and a * class (inheriting ToolBase) for execution logic.
  • Auto-discovery is automaticagent/tools/__init__.py registers any new Python file in the directory without manual imports.
  • Execution methods are _invoke() for synchronous work and _invoke_async() for I/O-bound operations.
  • Canvas integration uses set_output() to persist results for downstream agent components.

Frequently Asked Questions

What is the difference between ToolBase and ToolParamBase in RAGFlow?

ToolParamBase handles the declaration side—it defines the JSON schema, parameter defaults, and validation rules that the LLM uses to construct function calls. ToolBase handles the execution side—it provides the _invoke() or _invoke_async() methods that run when the LLM calls the tool, plus utilities like set_output() to write results back to the agent canvas. Together, they separate "what the tool accepts" from "what the tool does."

How does RAGFlow automatically discover new tools in agent/tools/?

The file agent/tools/__init__.py walks the directory at import time, dynamically imports every .py module (excluding base.py and itself), and collects all public classes using inspect.isclass(). These classes populate the global __all_classes list and __all__ export, making them available to the tool registry without requiring manual imports or configuration entries.

Can I create asynchronous tools in RAGFlow?

Yes. While the base ToolBase class provides _invoke() for synchronous execution, you can override _invoke_async() for coroutine-based logic. The framework’s invoke_async method automatically detects and awaits async implementations, making it ideal for HTTP requests, database queries, or other I/O-bound operations that should not block the agent event loop.

How do I pass structured results from a custom tool back to the agent?

Use the self.set_output(key, value) method provided by ToolBase. This stores data in the agent canvas where downstream components can reference it. For example, self.set_output("stock_price", 150.00) allows subsequent agent nodes to access the value via the canvas reference system. You should also return the primary result from _invoke() to ensure immediate usability in the conversation flow.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →