Debugging RAGFlow Applications: 8 Essential Techniques for Micro-Service Troubleshooting

Effective debugging of RAGFlow applications relies on leveraging the centralized init_root_logger system, controlling verbosity dynamically via the LOG_LEVELS environment variable, and attaching remote debuggers using the built-in debugpy integration when step-through inspection is required.

RAGFlow is a complex, micro-service-based Retrieval-Augmented Generation engine maintained by infiniflow. When failures occur across its distributed architecture—from the API server to background task executors—systematic debugging practices grounded in the source code are essential for rapid resolution. This guide covers the specific debugging primitives implemented in the repository to diagnose issues efficiently.

Initialize Centralized Logging with init_root_logger

All long-running processes in RAGFlow must establish consistent logging early in their lifecycle. In api/ragflow_server.py at line 78, the server initializes logging by calling:

init_root_logger("ragflow_server")

This function, defined in common/log_utils.py (lines 25-44), configures a rotating file handler that writes to logs/ragflow_server.log alongside a console stream handler. Each entry includes timestamps, process IDs, and severity levels, enabling you to trace request flows across the API server, admin service, and background workers. Centralizing logs in a single location per service eliminates fragmentation when correlating events across the micro-service architecture.

Control Log Verbosity Dynamically via LOG_LEVELS

Rather than modifying source code to adjust granularity, use the LOG_LEVELS environment variable to set package-specific verbosity at runtime. As implemented in common/log_utils.py (lines 48-67), this variable accepts comma-separated package assignments:

export LOG_LEVELS=peewee=DEBUG,rag=INFO,root=INFO

Setting peewee=DEBUG surfaces all database queries without flooding logs with root-level noise. Configure this in your .env file or docker-compose environment to debug connection issues or slow queries in production without redeploying code.

Capture Full Exception Tracebacks

RAGFlow provides robust exception handling patterns that preserve full stack traces. The log_exception helper appears throughout the codebase, including in tools/firecrawl/firecrawl_connector.py (line 236) and imported in rag/svr/task_executor.py (line 41).

When an exception bubbles up, logging.exception automatically records the traceback. Custom context objects may include a text attribute for additional debugging metadata. Implement this pattern in your own extensions:

import logging
from common.log_utils import init_root_logger

init_root_logger("my_plugin")
logger = logging.getLogger(__name__)

def process_document():
    try:
        # Critical retrieval logic

        retrieve_chunks()
    except Exception as e:
        logger.exception("Unhandled error during chunk retrieval")
        raise

Attach Remote Debuggers Using debugpy

For scenarios where log analysis is insufficient, RAGFlow includes built-in support for remote debugging. When the environment variable RAGFLOW_DEBUGPY_LISTEN is set to a non-zero port, the server initializes a debugpy listener before starting the main loop (see api/ragflow_server.py, lines 97-101):

export RAGFLOW_DEBUGPY_LISTEN=5678
python -m ragflow.main

Attach VS Code using this configuration:

{
  "name": "Attach to RAGFlow",
  "type": "python",
  "request": "attach",
  "connect": { "host": "localhost", "port": 5678 }
}

This technique is essential for diagnosing race conditions in the task executor or stepping through complex retrieval logic in the RAG pipeline.

Verify Runtime Configuration at Startup

Configuration errors often manifest as cryptic failures in downstream components. RAGFlow dumps runtime settings during initialization to aid debugging. In api/ragflow_server.py (lines 94-96), the application emits the full configuration loaded from common/settings.py when launched with the --debug flag:

python -m ragflow.main --debug

This output includes host IPs, port bindings, Redis connection parameters, and feature toggles. Verify these values against your environment variables when services fail to communicate or connect to external dependencies.

Monitor Background Workers and Signal Handling

RAGFlow relies on background workers for document processing and indexing. The update_progress thread acquires distributed Redis locks and logs failures via logging.exception (referenced at line 60 in the implementation). Check logs for "update_progress" entries to diagnose Redis connectivity issues or document indexing failures.

Additionally, ensure graceful shutdowns by verifying signal handlers. In api/ragflow_server.py (lines 69-74), handlers for SIGINT and SIGTERM close MCP (Micro-Control-Plane) sessions and halt background threads. When debugging startup crashes, confirm these handlers are not suppressing errors—logging.info calls within the handlers indicate the shutdown path was triggered.

Leverage the Test Suite as a Debugging Harness

The test/ directory contains an extensive pytest suite that serves as an isolated reproduction environment. When investigating bugs, run specific tests with verbose logging enabled:

uv run pytest test/testcases/test_sdk_api/test_session_management/test_create_session_with_chat_assistant.py::test_create_session_failure -vv --log-level=DEBUG

The -vv flag provides detailed assertion output, while --log-level=DEBUG surfaces internal debug statements configured via LOG_LEVELS. Add temporary logging.debug statements to narrow the failure scope without restarting the entire application stack.

Summary

  • Initialize init_root_logger early in every service entry point to ensure consistent log formatting and rotation to logs/
  • Control verbosity dynamically via the LOG_LEVELS environment variable (e.g., peewee=DEBUG) rather than hardcoding values
  • Capture full tracebacks using logger.exception() in exception handlers to preserve stack context
  • Attach remote debuggers by setting RAGFLOW_DEBUGPY_LISTEN for step-through debugging of race conditions
  • Verify configuration using the --debug startup flag to dump runtime settings from common/settings.py
  • Monitor background workers through update_progress thread logs and Redis lock status indicators
  • Utilize the pytest suite with verbose flags for isolated bug reproduction without full system restarts

Frequently Asked Questions

How do I enable debug logging for database queries in RAGFlow?

Set the environment variable LOG_LEVELS=peewee=DEBUG before starting the server. This configures the peewee ORM to log all SQL queries to the rotating log file without modifying source code, as parsed in common/log_utils.py (lines 48-67).

Where does RAGFlow store its log files?

By default, logs are written to the logs/ directory relative to the project root, with filenames derived from the service name (e.g., logs/ragflow_server.log). The rotating file handler defined in common/log_utils.py prevents disk space exhaustion by automatically rotating files when they reach size limits.

Can I debug a RAGFlow instance running inside a Docker container?

Yes. Expose port 5678 (or your chosen port) in your Docker Compose configuration, set the environment variable RAGFLOW_DEBUGPY_LISTEN=5678, and attach your debugger to the container's exposed port. Ensure your development machine has network access to the container and that the port mapping is correctly configured in docker-compose.yml.

How should I handle exceptions in custom RAGFlow plugins?

Import init_root_logger from common/log_utils and initialize it with your plugin name at module load time. Wrap critical operations in try-except blocks and use logger.exception() to capture full stack traces before re-raising. This ensures your plugin's errors appear in the centralized logs with the same formatting and rotation rules as core RAGFlow components.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →