RAGFlow Agent Tools: Complete Catalog and Extension Guide
RAGFlow ships approximately 20 built-in agent tools in agent/tools/—including Wikipedia, Google Search, GitHub, and ArXiv—each implemented as a *Param metadata class and a * execution class, with new tools auto-discovered simply by adding a Python file to the directory.
The RAGFlow repository (infiniflow/ragflow) provides a modular agent framework where capabilities are encapsulated as reusable tools. Located in agent/tools/, these components follow a strict dual-class architecture that separates JSON schema definition from runtime execution, enabling seamless LLM function calling without manual registration overhead.
Built-in Tools in agent/tools/
The RAGFlow agent toolkit includes search engines, academic databases, financial data sources, and utility executors. Every tool follows the pattern of a *Param class (defining metadata) and a * class (implementing logic).
| Tool | File | Param Class | Tool Class |
|---|---|---|---|
| Wikipedia | wikipedia.py |
WikipediaParam |
Wikipedia |
| Google Search | google.py |
GoogleParam |
Google |
| DuckDuckGo | duckduckgo.py |
DuckDuckGoParam |
DuckDuckGo |
| GitHub | github.py |
GitHubParam |
GitHub |
| ArXiv | arxiv.py |
ArXivParam |
ArXiv |
| PubMed | pubmed.py |
PubMedParam |
PubMed |
| Google Scholar | googlescholar.py |
GoogleScholarParam |
GoogleScholar |
| Tavily Search | tavily.py |
TavilySearchParam |
TavilySearch |
| Tavily Extract | tavily.py |
TavilyExtractParam |
TavilyExtract |
| SearXNG | searxng.py |
SearXNGParam |
SearXNG |
| Web Crawler | crawler.py |
CrawlerParam |
Crawler |
| Code Execution | code_exec.py |
CodeExecParam |
CodeExec |
email.py |
EmailParam |
Email |
|
| DeepL Translation | deepl.py |
DeepLParam |
DeepL |
| Yahoo Finance | yahoofinance.py |
YahooFinanceParam |
YahooFinance |
| AkShare | akshare.py |
AkShareParam |
AkShare |
| TuShare | tushare.py |
TuShareParam |
TuShare |
| QWeather | qweather.py |
QWeatherParam |
QWeather |
| Jin10 | jin10.py |
Jin10Param |
Jin10 |
| WenCai | wencai.py |
WenCaiParam |
WenCai |
| Retrieval | retrieval.py |
RetrievalParam |
Retrieval |
| ExSQL | exesql.py |
ExeSQLParam |
ExeSQL |
The Retrieval tool provides internal RAG document search, while ExSQL enables SQL execution against connected databases. Financial data tools like AkShare, TuShare, and Yahoo Finance target quantitative analysis workflows.
Tool Architecture and Base Classes
All RAGFlow agent tools inherit from two abstract base classes defined in agent/tools/base.py:
ToolParamBase– Declares the tool’s OpenAI-compatible function schema, includingname,description, andparametersJSON schema. It also handles default values and input validation via an optionalcheck()method.ToolBase– Provides the runtime environment, exposing_invoke()for synchronous execution or_invoke_async()for asynchronous operations, plus canvas interaction methods likeset_output()and_retrieve_chunks().
The module agent/tools/__init__.py implements automatic discovery. It walks the agent/tools/ directory, imports every Python module (excluding base.py and itself), and registers all public classes in __all__. This means adding a new tool requires zero configuration changes to the registry.
When the LLM decides to invoke a tool, LLMToolPluginCallSession.tool_call_async retrieves the concrete tool instance from the global tools_map, executes its invoke method, and records the result back to the agent canvas for downstream components to reference.
How to Extend RAGFlow with Custom Tools
Creating a new RAGFlow agent tool involves defining two classes in a new file within agent/tools/. The framework handles registration automatically.
Step 1: Create the Parameter Class
Define a class inheriting from ToolParamBase that specifies the tool’s metadata in a self.meta dictionary. This schema dictates how the LLM constructs function calls.
from agent.tools.base import ToolParamBase, ToolMeta
from abc import ABC
class GreetParam(ToolParamBase):
def __init__(self):
self.meta: ToolMeta = {
"name": "greet",
"description": "Generate a friendly greeting for a specified entity.",
"parameters": {
"who": {
"type": "string",
"description": "Person or entity to greet",
"default": "world",
"required": True,
}
},
}
super().__init__()
The meta dictionary must include name (the function identifier), description (visible to the LLM), and parameters (JSON Schema properties). Use {sys.user} or other template strings in defaults to reference runtime context variables.
Step 2: Implement the Execution Class
Create a class inheriting from ToolBase (and optionally ABC) that implements _invoke(). Use set_output() to push results back to the agent canvas so downstream nodes can reference them.
from agent.tools.base import ToolBase
class Greet(ToolBase, ABC):
component_name = "Greet"
def _invoke(self, **kwargs):
who = kwargs.get("who", "world")
message = f"👋 Hello, {who}!"
# Store outputs for canvas reference
self.set_output("message", message)
self.set_output("formalized_content", message)
return message
Step 3: Verify Auto-Discovery
Save both classes in agent/tools/greet.py. Because agent/tools/__init__.py automatically imports all modules and exposes their classes, the Greet and GreetParam classes are immediately available in the global namespace. Restart the agent process to load the new module.
Async Implementation Pattern
For tools requiring network I/O or heavy computation, implement _invoke_async() instead:
import aiohttp
class AsyncWeather(ToolBase, ABC):
component_name = "AsyncWeather"
async def _invoke_async(self, **kwargs):
city = kwargs.get("city")
async with aiohttp.ClientSession() as session:
async with session.get(f"https://api.weather.com/v1/{city}") as resp:
data = await resp.json()
self.set_output("temperature", data["temp"])
return data
The base class invoke_async wrapper will route calls to this method when running in async contexts.
Key Implementation Details for Extension
When building production-grade RAGFlow tools, reference these critical source locations:
agent/tools/base.py– ContainsToolBase._retrieve_chunks()for RAG integration andToolParamBase.generate()for schema serialization.agent/component/base.py– Providescheck_if_canceled()for long-running operations andget_component_name()for logging.agent/canvas.py– Manages the execution graph; tools write references here viaadd_reference()so the agent can cite sources.common/mcp_tool_call_conn.py– ImplementsMCPToolCallSessionfor sandboxing tools in separate processes; use this pattern for untrusted code execution.
Implement the check() method in your *Param class to validate configuration before execution:
def check(self):
if not self.meta.get("parameters"):
raise ValueError("Parameters schema is required")
return True
Summary
- RAGFlow provides 20+ ready-to-use agent tools in
agent/tools/, covering search, finance, translation, and code execution. - Each tool requires two classes: a
*Paramclass (inheritingToolParamBase) for schema definition and a*class (inheritingToolBase) for execution logic. - Auto-discovery is automatic—
agent/tools/__init__.pyregisters any new Python file in the directory without manual imports. - Execution methods are
_invoke()for synchronous work and_invoke_async()for I/O-bound operations. - Canvas integration uses
set_output()to persist results for downstream agent components.
Frequently Asked Questions
What is the difference between ToolBase and ToolParamBase in RAGFlow?
ToolParamBase handles the declaration side—it defines the JSON schema, parameter defaults, and validation rules that the LLM uses to construct function calls. ToolBase handles the execution side—it provides the _invoke() or _invoke_async() methods that run when the LLM calls the tool, plus utilities like set_output() to write results back to the agent canvas. Together, they separate "what the tool accepts" from "what the tool does."
How does RAGFlow automatically discover new tools in agent/tools/?
The file agent/tools/__init__.py walks the directory at import time, dynamically imports every .py module (excluding base.py and itself), and collects all public classes using inspect.isclass(). These classes populate the global __all_classes list and __all__ export, making them available to the tool registry without requiring manual imports or configuration entries.
Can I create asynchronous tools in RAGFlow?
Yes. While the base ToolBase class provides _invoke() for synchronous execution, you can override _invoke_async() for coroutine-based logic. The framework’s invoke_async method automatically detects and awaits async implementations, making it ideal for HTTP requests, database queries, or other I/O-bound operations that should not block the agent event loop.
How do I pass structured results from a custom tool back to the agent?
Use the self.set_output(key, value) method provided by ToolBase. This stores data in the agent canvas where downstream components can reference it. For example, self.set_output("stock_price", 150.00) allows subsequent agent nodes to access the value via the canvas reference system. You should also return the primary result from _invoke() to ensure immediate usability in the conversation flow.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →