# RAGFlow Agent Tools: Complete Catalog and Extension Guide

> Explore RAGFlow agent tools: discover built-in tools like Wikipedia, Google Search & more, and learn how to extend RAGFlow by adding new tools to its agent tools directory.

- Repository: [InfiniFlow/ragflow](https://github.com/infiniflow/ragflow)
- Tags: api-reference
- Published: 2026-02-23

---

**RAGFlow ships approximately 20 built-in agent tools in `agent/tools/`—including Wikipedia, Google Search, GitHub, and ArXiv—each implemented as a `*Param` metadata class and a `*` execution class, with new tools auto-discovered simply by adding a Python file to the directory.**

The RAGFlow repository (`infiniflow/ragflow`) provides a modular agent framework where capabilities are encapsulated as reusable tools. Located in `agent/tools/`, these components follow a strict dual-class architecture that separates JSON schema definition from runtime execution, enabling seamless LLM function calling without manual registration overhead.

## Built-in Tools in agent/tools/

The RAGFlow agent toolkit includes search engines, academic databases, financial data sources, and utility executors. Every tool follows the pattern of a `*Param` class (defining metadata) and a `*` class (implementing logic).

| Tool | File | Param Class | Tool Class |
|------|------|-------------|------------|
| **Wikipedia** | [`wikipedia.py`](https://github.com/infiniflow/ragflow/blob/main/wikipedia.py) | `WikipediaParam` | `Wikipedia` |
| **Google Search** | [`google.py`](https://github.com/infiniflow/ragflow/blob/main/google.py) | `GoogleParam` | `Google` |
| **DuckDuckGo** | [`duckduckgo.py`](https://github.com/infiniflow/ragflow/blob/main/duckduckgo.py) | `DuckDuckGoParam` | `DuckDuckGo` |
| **GitHub** | [`github.py`](https://github.com/infiniflow/ragflow/blob/main/github.py) | `GitHubParam` | `GitHub` |
| **ArXiv** | [`arxiv.py`](https://github.com/infiniflow/ragflow/blob/main/arxiv.py) | `ArXivParam` | `ArXiv` |
| **PubMed** | [`pubmed.py`](https://github.com/infiniflow/ragflow/blob/main/pubmed.py) | `PubMedParam` | `PubMed` |
| **Google Scholar** | [`googlescholar.py`](https://github.com/infiniflow/ragflow/blob/main/googlescholar.py) | `GoogleScholarParam` | `GoogleScholar` |
| **Tavily Search** | [`tavily.py`](https://github.com/infiniflow/ragflow/blob/main/tavily.py) | `TavilySearchParam` | `TavilySearch` |
| **Tavily Extract** | [`tavily.py`](https://github.com/infiniflow/ragflow/blob/main/tavily.py) | `TavilyExtractParam` | `TavilyExtract` |
| **SearXNG** | [`searxng.py`](https://github.com/infiniflow/ragflow/blob/main/searxng.py) | `SearXNGParam` | `SearXNG` |
| **Web Crawler** | [`crawler.py`](https://github.com/infiniflow/ragflow/blob/main/crawler.py) | `CrawlerParam` | `Crawler` |
| **Code Execution** | [`code_exec.py`](https://github.com/infiniflow/ragflow/blob/main/code_exec.py) | `CodeExecParam` | `CodeExec` |
| **Email** | [`email.py`](https://github.com/infiniflow/ragflow/blob/main/email.py) | `EmailParam` | `Email` |
| **DeepL Translation** | [`deepl.py`](https://github.com/infiniflow/ragflow/blob/main/deepl.py) | `DeepLParam` | `DeepL` |
| **Yahoo Finance** | [`yahoofinance.py`](https://github.com/infiniflow/ragflow/blob/main/yahoofinance.py) | `YahooFinanceParam` | `YahooFinance` |
| **AkShare** | [`akshare.py`](https://github.com/infiniflow/ragflow/blob/main/akshare.py) | `AkShareParam` | `AkShare` |
| **TuShare** | [`tushare.py`](https://github.com/infiniflow/ragflow/blob/main/tushare.py) | `TuShareParam` | `TuShare` |
| **QWeather** | [`qweather.py`](https://github.com/infiniflow/ragflow/blob/main/qweather.py) | `QWeatherParam` | `QWeather` |
| **Jin10** | [`jin10.py`](https://github.com/infiniflow/ragflow/blob/main/jin10.py) | `Jin10Param` | `Jin10` |
| **WenCai** | [`wencai.py`](https://github.com/infiniflow/ragflow/blob/main/wencai.py) | `WenCaiParam` | `WenCai` |
| **Retrieval** | [`retrieval.py`](https://github.com/infiniflow/ragflow/blob/main/retrieval.py) | `RetrievalParam` | `Retrieval` |
| **ExSQL** | [`exesql.py`](https://github.com/infiniflow/ragflow/blob/main/exesql.py) | `ExeSQLParam` | `ExeSQL` |

The **Retrieval** tool provides internal RAG document search, while **ExSQL** enables SQL execution against connected databases. Financial data tools like **AkShare**, **TuShare**, and **Yahoo Finance** target quantitative analysis workflows.

## Tool Architecture and Base Classes

All RAGFlow agent tools inherit from two abstract base classes defined in [`agent/tools/base.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/base.py):

- **`ToolParamBase`** – Declares the tool’s OpenAI-compatible function schema, including `name`, `description`, and `parameters` JSON schema. It also handles default values and input validation via an optional `check()` method.
- **`ToolBase`** – Provides the runtime environment, exposing `_invoke()` for synchronous execution or `_invoke_async()` for asynchronous operations, plus canvas interaction methods like `set_output()` and `_retrieve_chunks()`.

The module [`agent/tools/__init__.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/__init__.py) implements automatic discovery. It walks the `agent/tools/` directory, imports every Python module (excluding [`base.py`](https://github.com/infiniflow/ragflow/blob/main/base.py) and itself), and registers all public classes in `__all__`. This means adding a new tool requires zero configuration changes to the registry.

When the LLM decides to invoke a tool, `LLMToolPluginCallSession.tool_call_async` retrieves the concrete tool instance from the global `tools_map`, executes its `invoke` method, and records the result back to the agent canvas for downstream components to reference.

## How to Extend RAGFlow with Custom Tools

Creating a new RAGFlow agent tool involves defining two classes in a new file within `agent/tools/`. The framework handles registration automatically.

### Step 1: Create the Parameter Class

Define a class inheriting from `ToolParamBase` that specifies the tool’s metadata in a `self.meta` dictionary. This schema dictates how the LLM constructs function calls.

```python
from agent.tools.base import ToolParamBase, ToolMeta
from abc import ABC

class GreetParam(ToolParamBase):
    def __init__(self):
        self.meta: ToolMeta = {
            "name": "greet",
            "description": "Generate a friendly greeting for a specified entity.",
            "parameters": {
                "who": {
                    "type": "string",
                    "description": "Person or entity to greet",
                    "default": "world",
                    "required": True,
                }
            },
        }
        super().__init__()

```

The `meta` dictionary must include `name` (the function identifier), `description` (visible to the LLM), and `parameters` (JSON Schema properties). Use `{sys.user}` or other template strings in defaults to reference runtime context variables.

### Step 2: Implement the Execution Class

Create a class inheriting from `ToolBase` (and optionally `ABC`) that implements `_invoke()`. Use `set_output()` to push results back to the agent canvas so downstream nodes can reference them.

```python
from agent.tools.base import ToolBase

class Greet(ToolBase, ABC):
    component_name = "Greet"

    def _invoke(self, **kwargs):
        who = kwargs.get("who", "world")
        message = f"👋 Hello, {who}!"
        
        # Store outputs for canvas reference

        self.set_output("message", message)
        self.set_output("formalized_content", message)
        return message

```

### Step 3: Verify Auto-Discovery

Save both classes in [`agent/tools/greet.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/greet.py). Because [`agent/tools/__init__.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/__init__.py) automatically imports all modules and exposes their classes, the `Greet` and `GreetParam` classes are immediately available in the global namespace. Restart the agent process to load the new module.

### Async Implementation Pattern

For tools requiring network I/O or heavy computation, implement `_invoke_async()` instead:

```python
import aiohttp

class AsyncWeather(ToolBase, ABC):
    component_name = "AsyncWeather"
    
    async def _invoke_async(self, **kwargs):
        city = kwargs.get("city")
        async with aiohttp.ClientSession() as session:
            async with session.get(f"https://api.weather.com/v1/{city}") as resp:
                data = await resp.json()
                self.set_output("temperature", data["temp"])
                return data

```

The base class `invoke_async` wrapper will route calls to this method when running in async contexts.

## Key Implementation Details for Extension

When building production-grade RAGFlow tools, reference these critical source locations:

- **[`agent/tools/base.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/base.py)** – Contains `ToolBase._retrieve_chunks()` for RAG integration and `ToolParamBase.generate()` for schema serialization.
- **[`agent/component/base.py`](https://github.com/infiniflow/ragflow/blob/main/agent/component/base.py)** – Provides `check_if_canceled()` for long-running operations and `get_component_name()` for logging.
- **[`agent/canvas.py`](https://github.com/infiniflow/ragflow/blob/main/agent/canvas.py)** – Manages the execution graph; tools write references here via `add_reference()` so the agent can cite sources.
- **[`common/mcp_tool_call_conn.py`](https://github.com/infiniflow/ragflow/blob/main/common/mcp_tool_call_conn.py)** – Implements `MCPToolCallSession` for sandboxing tools in separate processes; use this pattern for untrusted code execution.

Implement the `check()` method in your `*Param` class to validate configuration before execution:

```python
def check(self):
    if not self.meta.get("parameters"):
        raise ValueError("Parameters schema is required")
    return True

```

## Summary

- **RAGFlow provides 20+ ready-to-use agent tools** in `agent/tools/`, covering search, finance, translation, and code execution.
- **Each tool requires two classes**: a `*Param` class (inheriting `ToolParamBase`) for schema definition and a `*` class (inheriting `ToolBase`) for execution logic.
- **Auto-discovery is automatic**—[`agent/tools/__init__.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/__init__.py) registers any new Python file in the directory without manual imports.
- **Execution methods** are `_invoke()` for synchronous work and `_invoke_async()` for I/O-bound operations.
- **Canvas integration** uses `set_output()` to persist results for downstream agent components.

## Frequently Asked Questions

### What is the difference between ToolBase and ToolParamBase in RAGFlow?

`ToolParamBase` handles the **declaration** side—it defines the JSON schema, parameter defaults, and validation rules that the LLM uses to construct function calls. `ToolBase` handles the **execution** side—it provides the `_invoke()` or `_invoke_async()` methods that run when the LLM calls the tool, plus utilities like `set_output()` to write results back to the agent canvas. Together, they separate "what the tool accepts" from "what the tool does."

### How does RAGFlow automatically discover new tools in agent/tools/?

The file [`agent/tools/__init__.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/__init__.py) walks the directory at import time, dynamically imports every `.py` module (excluding [`base.py`](https://github.com/infiniflow/ragflow/blob/main/base.py) and itself), and collects all public classes using `inspect.isclass()`. These classes populate the global `__all_classes` list and `__all__` export, making them available to the tool registry without requiring manual imports or configuration entries.

### Can I create asynchronous tools in RAGFlow?

Yes. While the base `ToolBase` class provides `_invoke()` for synchronous execution, you can override `_invoke_async()` for coroutine-based logic. The framework’s `invoke_async` method automatically detects and awaits async implementations, making it ideal for HTTP requests, database queries, or other I/O-bound operations that should not block the agent event loop.

### How do I pass structured results from a custom tool back to the agent?

Use the `self.set_output(key, value)` method provided by `ToolBase`. This stores data in the agent canvas where downstream components can reference it. For example, `self.set_output("stock_price", 150.00)` allows subsequent agent nodes to access the value via the canvas reference system. You should also return the primary result from `_invoke()` to ensure immediate usability in the conversation flow.