# How to Configure Celery Workers for Asynchronous Query Execution in Apache Superset

> Learn to configure Apache Superset Celery workers for asynchronous query execution. Set up your broker, start workers, and process SQL queries efficiently for background tasks.

- Repository: [The Apache Software Foundation/superset](https://github.com/apache/superset)
- Tags: how-to-guide
- Published: 2026-03-03

---

**To configure Celery workers for asynchronous query execution in Apache Superset, define the broker URL in [`superset/config.py`](https://github.com/apache/superset/blob/main/superset/config.py), start workers with `celery -A superset.tasks.celery_app.celery_app worker`, and ensure the `execute_sql_task` in [`superset/sql/execution/celery_task.py`](https://github.com/apache/superset/blob/main/superset/sql/execution/celery_task.py) processes queries from the message queue.**

Apache Superset relies on **Celery** to offload long-running SQL queries and background tasks from the web application, ensuring the UI remains responsive during heavy workloads. This distributed task queue architecture requires proper configuration of message brokers, worker processes, and result backends to function correctly in production environments. Understanding how to configure Celery workers for asynchronous query execution allows administrators to scale query processing horizontally across multiple worker nodes.

## Architecture of Asynchronous Query Execution

The implementation spans three core components that interact to process queries outside the request-response cycle.

### Celery App Initialization

The global **Celery application** is defined in [`superset/tasks/celery_app.py`](https://github.com/apache/superset/blob/main/superset/tasks/celery_app.py) and registered during Flask-AppBuilder initialization within [`superset/initialization/__init__.py`](https://github.com/apache/superset/blob/main/superset/initialization/__init__.py). This module reads configuration values from [`superset/config.py`](https://github.com/apache/superset/blob/main/superset/config.py) (specifically lines 1359–1419), establishing the connection to the message broker and result backend before workers begin consuming tasks.

### SQL Execution Task Logic

The actual query processing logic resides in [`superset/sql/execution/celery_task.py`](https://github.com/apache/superset/blob/main/superset/sql/execution/celery_task.py), specifically within the `execute_sql_task` function decorated with `@celery_app.task(name="query_execution.execute_sql")`. When invoked via `SQLExecutor.execute_async()`, this task manages the complete query lifecycle: transitioning status to **running**, executing statements via `_execute_sql_statements`, finalizing successful queries through `_finalize_successful_query`, and optionally storing results via `_store_results_in_backend`.

### Result Backend Integration

After successful execution, query payloads are serialized and written to the configured **results backend**—commonly Redis, S3, or other storage systems supported by Celery. The `RESULTS_BACKEND_USE_MSGPACK` setting controls whether Apache Arrow IPC or msgpack serialization optimizes transfer efficiency for large result sets.

## Configuration Steps

Setting up Celery requires configuring the message broker, defining worker parameters, and launching processes with appropriate queue isolation.

### Configure the Message Broker

Superset defaults to **Redis** for both the broker and result backend. Set these variables in your [`superset/config.py`](https://github.com/apache/superset/blob/main/superset/config.py) or via environment variables:

```python
REDIS_HOST = "redis"
REDIS_PORT = 6379
REDIS_CELERY_DB = 2

CELERY_BROKER_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
CELERY_RESULT_BACKEND = CELERY_BROKER_URL

```

These values construct the connection strings used by the Celery app during initialization.

### Enable Celery Configuration

Ensure your custom configuration properly references the Celery configuration class:

```python
from superset.config import CeleryConfig

CELERY_CONFIG = CeleryConfig

```

For advanced setups, subclass `CeleryConfig` to override broker URLs or serialization settings while maintaining the base configuration structure.

### Start Celery Workers

Launch worker processes on each host designated for query processing:

```bash
celery -A superset.tasks.celery_app.celery_app worker \
    --loglevel=INFO \
    --concurrency=4 \
    --queues=queries

```

The `-A` flag points to the Superset Celery app instance. The `--queues=queries` flag isolates workers to process only SQL Lab asynchronous queries, preventing resource contention with other background tasks like email reports or cache warming.

### Optional: Run the Celery Beat Scheduler

For periodic tasks such as scheduled reports or cache refreshes, run the beat scheduler:

```bash
celery -A superset.tasks.celery_app.celery_app beat \
    --loglevel=INFO

```

This process, defined in [`superset/tasks/scheduler.py`](https://github.com/apache/superset/blob/main/superset/tasks/scheduler.py), dispatches scheduled jobs to available workers at configured intervals.

## Critical Configuration Parameters

Understanding these settings ensures stable operation under varying workload conditions:

- **CELERY_BROKER_URL**: The connection string for the message broker (Redis, RabbitMQ, etc.). Defaults are constructed from `REDIS_HOST`, `REDIS_PORT`, and `REDIS_CELERY_DB` in [`superset/config.py`](https://github.com/apache/superset/blob/main/superset/config.py).

- **SQLLAB_ASYNC_TIME_LIMIT_SEC**: Soft time limit for query execution. Tasks exceeding this duration raise `SoftTimeLimitExceeded`, allowing graceful termination without killing the worker process.

- **SQLLAB_PAYLOAD_MAX_MB**: Maximum serialized payload size permitted for storage in the results backend. Increase this for large result sets, but monitor storage consumption.

- **CELERY_ALWAYS_EAGER**: When set to `True` in [`superset/tasks/celery_app.py`](https://github.com/apache/superset/blob/main/superset/tasks/celery_app.py), tasks execute synchronously within the web process—useful for testing but disabled in production to enable true asynchronous processing.

- **RESULTS_BACKEND_USE_MSGPACK**: Enables msgpack or Apache Arrow IPC serialization for efficient result transfer between workers and the web application.

## Production Tuning Recommendations

Optimize worker performance based on infrastructure constraints and query patterns.

### Concurrency and Resource Allocation

Set `--concurrency` to match available CPU cores while respecting database connection pool limits. Each concurrent worker process maintains database connections; exceeding pool capacity causes connection failures.

### Payload Size Management

Monitor `SQLLAB_PAYLOAD_MAX_MB` when enabling `RESULTS_BACKEND_USE_MSGPACK`. While msgpack reduces serialization overhead in [`superset/sql/execution/celery_task.py`](https://github.com/apache/superset/blob/main/superset/sql/execution/celery_task.py), extremely large payloads may still overwhelm Redis memory or S3 transfer limits.

## Summary

- Configure `CELERY_BROKER_URL` in [`superset/config.py`](https://github.com/apache/superset/blob/main/superset/config.py) using Redis or RabbitMQ to enable message passing between the web application and workers.
- Launch workers with `celery -A superset.tasks.celery_app.celery_app worker`, optionally specifying `--queues=queries` for dedicated SQL Lab processing.
- The `execute_sql_task` in [`superset/sql/execution/celery_task.py`](https://github.com/apache/superset/blob/main/superset/sql/execution/celery_task.py) handles query lifecycle management, impersonating users within a Flask request context for security compliance.
- Tune `SQLLAB_ASYNC_TIME_LIMIT_SEC` and `SQLLAB_PAYLOAD_MAX_MB` to prevent resource exhaustion from long-running queries or oversized result sets.
- Use `RESULTS_BACKEND_USE_MSGPACK` for efficient serialization of large query results.

## Frequently Asked Questions

### What message brokers does Superset support for Celery?

Superset supports any broker compatible with the Celery framework, including **Redis**, **RabbitMQ**, and **Amazon SQS**. Redis is the default and most common choice, configured via `REDIS_HOST`, `REDIS_PORT`, and `REDIS_CELERY_DB` variables in [`superset/config.py`](https://github.com/apache/superset/blob/main/superset/config.py).

### How does Superset handle security context in asynchronous tasks?

The `execute_sql_task` in [`superset/sql/execution/celery_task.py`](https://github.com/apache/superset/blob/main/superset/sql/execution/celery_task.py) wraps execution in a Flask test request context and uses `override_user` to impersonate the original querier. This ensures all security checks, row-level security filters, and datasource permissions apply exactly as they would in a synchronous web request.

### What happens if a query exceeds the async time limit?

When execution duration surpasses `SQLLAB_ASYNC_TIME_LIMIT_SEC`, Celery raises a `SoftTimeLimitExceeded` exception. The task attempts graceful cleanup via `_finalize_successful_query` or error handlers, updating the query status to failed or success depending on completion state, without terminating the worker process.

### Can I run different workers for SQL Lab queries and scheduled reports?

Yes. Use the `--queues` parameter when starting workers to isolate responsibilities. For example, launch one worker with `--queues=queries` for SQL Lab async execution and another with `--queues=celery` (or specific report queues) for email reports and alerts defined in [`superset/tasks/scheduler.py`](https://github.com/apache/superset/blob/main/superset/tasks/scheduler.py).