Debugging RAGFlow Applications: 8 Essential Techniques for Micro-Service Troubleshooting
Effective debugging of RAGFlow applications relies on leveraging the centralized init_root_logger system, controlling verbosity dynamically via the LOG_LEVELS environment variable, and attaching remote debuggers using the built-in debugpy integration when step-through inspection is required.
RAGFlow is a complex, micro-service-based Retrieval-Augmented Generation engine maintained by infiniflow. When failures occur across its distributed architecture—from the API server to background task executors—systematic debugging practices grounded in the source code are essential for rapid resolution. This guide covers the specific debugging primitives implemented in the repository to diagnose issues efficiently.
Initialize Centralized Logging with init_root_logger
All long-running processes in RAGFlow must establish consistent logging early in their lifecycle. In api/ragflow_server.py at line 78, the server initializes logging by calling:
init_root_logger("ragflow_server")
This function, defined in common/log_utils.py (lines 25-44), configures a rotating file handler that writes to logs/ragflow_server.log alongside a console stream handler. Each entry includes timestamps, process IDs, and severity levels, enabling you to trace request flows across the API server, admin service, and background workers. Centralizing logs in a single location per service eliminates fragmentation when correlating events across the micro-service architecture.
Control Log Verbosity Dynamically via LOG_LEVELS
Rather than modifying source code to adjust granularity, use the LOG_LEVELS environment variable to set package-specific verbosity at runtime. As implemented in common/log_utils.py (lines 48-67), this variable accepts comma-separated package assignments:
export LOG_LEVELS=peewee=DEBUG,rag=INFO,root=INFO
Setting peewee=DEBUG surfaces all database queries without flooding logs with root-level noise. Configure this in your .env file or docker-compose environment to debug connection issues or slow queries in production without redeploying code.
Capture Full Exception Tracebacks
RAGFlow provides robust exception handling patterns that preserve full stack traces. The log_exception helper appears throughout the codebase, including in tools/firecrawl/firecrawl_connector.py (line 236) and imported in rag/svr/task_executor.py (line 41).
When an exception bubbles up, logging.exception automatically records the traceback. Custom context objects may include a text attribute for additional debugging metadata. Implement this pattern in your own extensions:
import logging
from common.log_utils import init_root_logger
init_root_logger("my_plugin")
logger = logging.getLogger(__name__)
def process_document():
try:
# Critical retrieval logic
retrieve_chunks()
except Exception as e:
logger.exception("Unhandled error during chunk retrieval")
raise
Attach Remote Debuggers Using debugpy
For scenarios where log analysis is insufficient, RAGFlow includes built-in support for remote debugging. When the environment variable RAGFLOW_DEBUGPY_LISTEN is set to a non-zero port, the server initializes a debugpy listener before starting the main loop (see api/ragflow_server.py, lines 97-101):
export RAGFLOW_DEBUGPY_LISTEN=5678
python -m ragflow.main
Attach VS Code using this configuration:
{
"name": "Attach to RAGFlow",
"type": "python",
"request": "attach",
"connect": { "host": "localhost", "port": 5678 }
}
This technique is essential for diagnosing race conditions in the task executor or stepping through complex retrieval logic in the RAG pipeline.
Verify Runtime Configuration at Startup
Configuration errors often manifest as cryptic failures in downstream components. RAGFlow dumps runtime settings during initialization to aid debugging. In api/ragflow_server.py (lines 94-96), the application emits the full configuration loaded from common/settings.py when launched with the --debug flag:
python -m ragflow.main --debug
This output includes host IPs, port bindings, Redis connection parameters, and feature toggles. Verify these values against your environment variables when services fail to communicate or connect to external dependencies.
Monitor Background Workers and Signal Handling
RAGFlow relies on background workers for document processing and indexing. The update_progress thread acquires distributed Redis locks and logs failures via logging.exception (referenced at line 60 in the implementation). Check logs for "update_progress" entries to diagnose Redis connectivity issues or document indexing failures.
Additionally, ensure graceful shutdowns by verifying signal handlers. In api/ragflow_server.py (lines 69-74), handlers for SIGINT and SIGTERM close MCP (Micro-Control-Plane) sessions and halt background threads. When debugging startup crashes, confirm these handlers are not suppressing errors—logging.info calls within the handlers indicate the shutdown path was triggered.
Leverage the Test Suite as a Debugging Harness
The test/ directory contains an extensive pytest suite that serves as an isolated reproduction environment. When investigating bugs, run specific tests with verbose logging enabled:
uv run pytest test/testcases/test_sdk_api/test_session_management/test_create_session_with_chat_assistant.py::test_create_session_failure -vv --log-level=DEBUG
The -vv flag provides detailed assertion output, while --log-level=DEBUG surfaces internal debug statements configured via LOG_LEVELS. Add temporary logging.debug statements to narrow the failure scope without restarting the entire application stack.
Summary
- Initialize
init_root_loggerearly in every service entry point to ensure consistent log formatting and rotation tologs/ - Control verbosity dynamically via the
LOG_LEVELSenvironment variable (e.g.,peewee=DEBUG) rather than hardcoding values - Capture full tracebacks using
logger.exception()in exception handlers to preserve stack context - Attach remote debuggers by setting
RAGFLOW_DEBUGPY_LISTENfor step-through debugging of race conditions - Verify configuration using the
--debugstartup flag to dump runtime settings fromcommon/settings.py - Monitor background workers through
update_progressthread logs and Redis lock status indicators - Utilize the pytest suite with verbose flags for isolated bug reproduction without full system restarts
Frequently Asked Questions
How do I enable debug logging for database queries in RAGFlow?
Set the environment variable LOG_LEVELS=peewee=DEBUG before starting the server. This configures the peewee ORM to log all SQL queries to the rotating log file without modifying source code, as parsed in common/log_utils.py (lines 48-67).
Where does RAGFlow store its log files?
By default, logs are written to the logs/ directory relative to the project root, with filenames derived from the service name (e.g., logs/ragflow_server.log). The rotating file handler defined in common/log_utils.py prevents disk space exhaustion by automatically rotating files when they reach size limits.
Can I debug a RAGFlow instance running inside a Docker container?
Yes. Expose port 5678 (or your chosen port) in your Docker Compose configuration, set the environment variable RAGFLOW_DEBUGPY_LISTEN=5678, and attach your debugger to the container's exposed port. Ensure your development machine has network access to the container and that the port mapping is correctly configured in docker-compose.yml.
How should I handle exceptions in custom RAGFlow plugins?
Import init_root_logger from common/log_utils and initialize it with your plugin name at module load time. Wrap critical operations in try-except blocks and use logger.exception() to capture full stack traces before re-raising. This ensures your plugin's errors appear in the centralized logs with the same formatting and rotation rules as core RAGFlow components.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →