how-to-guide

How to Configure Celery Workers for Asynchronous Query Execution in Apache Superset

March 3, 2026 apache/superset ↗

To configure Celery workers for asynchronous query execution in Apache Superset, define the broker URL in superset/config.py, start workers with celery -A superset.tasks.celery_app.celery_app worker, and ensure the execute_sql_task in superset/sql/execution/celery_task.py processes queries from the message queue.

Apache Superset relies on Celery to offload long-running SQL queries and background tasks from the web application, ensuring the UI remains responsive during heavy workloads. This distributed task queue architecture requires proper configuration of message brokers, worker processes, and result backends to function correctly in production environments. Understanding how to configure Celery workers for asynchronous query execution allows administrators to scale query processing horizontally across multiple worker nodes.

Architecture of Asynchronous Query Execution

The implementation spans three core components that interact to process queries outside the request-response cycle.

Celery App Initialization

The global Celery application is defined in superset/tasks/celery_app.py and registered during Flask-AppBuilder initialization within superset/initialization/__init__.py. This module reads configuration values from superset/config.py (specifically lines 1359–1419), establishing the connection to the message broker and result backend before workers begin consuming tasks.

SQL Execution Task Logic

The actual query processing logic resides in superset/sql/execution/celery_task.py, specifically within the execute_sql_task function decorated with @celery_app.task(name="query_execution.execute_sql"). When invoked via SQLExecutor.execute_async(), this task manages the complete query lifecycle: transitioning status to running, executing statements via _execute_sql_statements, finalizing successful queries through _finalize_successful_query, and optionally storing results via _store_results_in_backend.

Result Backend Integration

After successful execution, query payloads are serialized and written to the configured results backend—commonly Redis, S3, or other storage systems supported by Celery. The RESULTS_BACKEND_USE_MSGPACK setting controls whether Apache Arrow IPC or msgpack serialization optimizes transfer efficiency for large result sets.

Configuration Steps

Setting up Celery requires configuring the message broker, defining worker parameters, and launching processes with appropriate queue isolation.

Configure the Message Broker

Superset defaults to Redis for both the broker and result backend. Set these variables in your superset/config.py or via environment variables:

REDIS_HOST = "redis"
REDIS_PORT = 6379
REDIS_CELERY_DB = 2

CELERY_BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
CELERY_RESULT_BACKEND = CELERY_BROKER_URL

These values construct the connection strings used by the Celery app during initialization.

Enable Celery Configuration

Ensure your custom configuration properly references the Celery configuration class:

from superset.config import CeleryConfig

CELERY_CONFIG = CeleryConfig

For advanced setups, subclass CeleryConfig to override broker URLs or serialization settings while maintaining the base configuration structure.

Start Celery Workers

Launch worker processes on each host designated for query processing:

celery -A superset.tasks.celery_app.celery_app worker \
    --loglevel=INFO \
    --concurrency=4 \
    --queues=queries

The -A flag points to the Superset Celery app instance. The --queues=queries flag isolates workers to process only SQL Lab asynchronous queries, preventing resource contention with other background tasks like email reports or cache warming.

Optional: Run the Celery Beat Scheduler

For periodic tasks such as scheduled reports or cache refreshes, run the beat scheduler:

celery -A superset.tasks.celery_app.celery_app beat \
    --loglevel=INFO

This process, defined in superset/tasks/scheduler.py, dispatches scheduled jobs to available workers at configured intervals.

Critical Configuration Parameters

Understanding these settings ensures stable operation under varying workload conditions:

CELERY_BROKER_URL: The connection string for the message broker (Redis, RabbitMQ, etc.). Defaults are constructed from REDIS_HOST, REDIS_PORT, and REDIS_CELERY_DB in superset/config.py.
SQLLAB_ASYNC_TIME_LIMIT_SEC: Soft time limit for query execution. Tasks exceeding this duration raise SoftTimeLimitExceeded, allowing graceful termination without killing the worker process.
SQLLAB_PAYLOAD_MAX_MB: Maximum serialized payload size permitted for storage in the results backend. Increase this for large result sets, but monitor storage consumption.
CELERY_ALWAYS_EAGER: When set to True in superset/tasks/celery_app.py, tasks execute synchronously within the web process—useful for testing but disabled in production to enable true asynchronous processing.
RESULTS_BACKEND_USE_MSGPACK: Enables msgpack or Apache Arrow IPC serialization for efficient result transfer between workers and the web application.

Production Tuning Recommendations

Optimize worker performance based on infrastructure constraints and query patterns.

Concurrency and Resource Allocation

Set --concurrency to match available CPU cores while respecting database connection pool limits. Each concurrent worker process maintains database connections; exceeding pool capacity causes connection failures.

Payload Size Management

Monitor SQLLAB_PAYLOAD_MAX_MB when enabling RESULTS_BACKEND_USE_MSGPACK. While msgpack reduces serialization overhead in superset/sql/execution/celery_task.py, extremely large payloads may still overwhelm Redis memory or S3 transfer limits.

Summary

Configure CELERY_BROKER_URL in superset/config.py using Redis or RabbitMQ to enable message passing between the web application and workers.
Launch workers with celery -A superset.tasks.celery_app.celery_app worker, optionally specifying --queues=queries for dedicated SQL Lab processing.
The execute_sql_task in superset/sql/execution/celery_task.py handles query lifecycle management, impersonating users within a Flask request context for security compliance.
Tune SQLLAB_ASYNC_TIME_LIMIT_SEC and SQLLAB_PAYLOAD_MAX_MB to prevent resource exhaustion from long-running queries or oversized result sets.
Use RESULTS_BACKEND_USE_MSGPACK for efficient serialization of large query results.

Frequently Asked Questions

What message brokers does Superset support for Celery?

Superset supports any broker compatible with the Celery framework, including Redis, RabbitMQ, and Amazon SQS. Redis is the default and most common choice, configured via REDIS_HOST, REDIS_PORT, and REDIS_CELERY_DB variables in superset/config.py.

How does Superset handle security context in asynchronous tasks?

The execute_sql_task in superset/sql/execution/celery_task.py wraps execution in a Flask test request context and uses override_user to impersonate the original querier. This ensures all security checks, row-level security filters, and datasource permissions apply exactly as they would in a synchronous web request.

What happens if a query exceeds the async time limit?

When execution duration surpasses SQLLAB_ASYNC_TIME_LIMIT_SEC, Celery raises a SoftTimeLimitExceeded exception. The task attempts graceful cleanup via _finalize_successful_query or error handlers, updating the query status to failed or success depending on completion state, without terminating the worker process.

Can I run different workers for SQL Lab queries and scheduled reports?

Yes. Use the --queues parameter when starting workers to isolate responsibilities. For example, launch one worker with --queues=queries for SQL Lab async execution and another with --queues=celery (or specific report queues) for email reports and alerts defined in superset/tasks/scheduler.py.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how apache/superset works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →