# How to Manage Database Connections in Apache Superset: Connection Pooling and Async Execution Guide

> Learn how to manage Apache Superset database connections with connection pooling and async execution. Optimize performance by configuring SQLAlchemy engine pooling and leveraging Celery workers.

- Repository: [The Apache Software Foundation/superset](https://github.com/apache/superset)
- Tags: how-to-guide
- Published: 2026-03-03

---

**Apache Superset manages database connections through SQLAlchemy engine pooling configured via JSON parameters in the Database "Extra" field, while heavy queries execute asynchronously via Celery workers that automatically reset connection pools on startup to prevent stale connections.**

Apache Superset's database layer relies on SQLAlchemy for connection management and Celery for background task processing. Understanding how to manage database connections with connection pooling and async execution is essential for optimizing query performance and ensuring stable connections across UI requests and long-running analytical workloads. The architecture separates immediate synchronous queries from background async execution, each with distinct pooling strategies defined in [`superset/models/core.py`](https://github.com/apache/superset/blob/main/superset/models/core.py).

## Connection Pooling Architecture

Superset implements connection pooling through SQLAlchemy's engine caching mechanism, allowing database connections to be reused across requests while providing isolation options for specific use cases.

### Engine Creation and Caching in superset/models/core.py

The `Database._get_sqla_engine` method serves as the central factory for SQLAlchemy engines. According to the source code in [`superset/models/core.py`](https://github.com/apache/superset/blob/main/superset/models/core.py), lines 48-49 instantiate the engine via `create_engine(sqlalchemy_url, **engine_kwargs)`, where `sqlalchemy_url` is constructed from the database configuration and `engine_kwargs` contains merged parameters.

Engine parameters are extracted from the database record's JSON "Extra" field. Lines 92-94 demonstrate how Superset retrieves these settings: `extra = self.get_extra(source)` followed by `engine_kwargs = extra.get("engine_params", {})`. This allows administrators to inject standard SQLAlchemy pool arguments such as `pool_size`, `max_overflow`, and `pool_timeout` directly into the engine initialization process.

### Pool Configuration and NullPool Strategy

Superset dynamically selects the pool class based on the `nullpool` parameter. In [`superset/models/core.py`](https://github.com/apache/superset/blob/main/superset/models/core.py), lines 495-496 enforce connection isolation for UI queries: `if nullpool: engine_kwargs["poolclass"] = NullPool`. When `nullpool=True` (the default for interactive UI queries), Superset uses SQLAlchemy's `NullPool` class, which opens and closes connections for each request rather than maintaining a persistent pool.

For background workers or high-traffic scenarios where connection reuse is beneficial, setting `nullpool=False` allows the `engine_params` configuration to specify alternative pool classes like `QueuePool`. The engine instance is cached within the Flask application context, ensuring that subsequent calls to `get_sqla_engine` reuse the same pooled connections unless explicitly configured otherwise.

### Worker-Side Pool Management

Celery workers require special handling to prevent connection leaks and stale pool states. The `reset_db_connection_pool` function in [`superset/tasks/celery_app.py`](https://github.com/apache/superset/blob/main/superset/tasks/celery_app.py) (line 47) calls `db.engine.dispose()` during the `worker_process_init` signal. This disposal forces SQLAlchemy to recreate the engine and connection pool when a worker process starts, ensuring that forked processes do not inherit potentially corrupted database connections from the parent process.

## Asynchronous Query Execution

Superset delegates long-running queries to Celery workers through the `execute_async` API, which isolates heavy database operations from the web server request-response cycle.

### The Async Query Flow

The entry point for async execution is `Database.execute_async` defined in [`superset/models/core.py`](https://github.com/apache/superset/blob/main/superset/models/core.py), lines 1312-1326. This method instantiates a `SQLExecutor` and delegates to its `execute_async` implementation. In [`superset/sql/execution/executor.py`](https://github.com/apache/superset/blob/main/superset/sql/execution/executor.py), lines 46-48, the executor prepares the SQL statement through `_prepare_sql`, applying security checks and Jinja templating before submission.

The system supports a dry-run mode for validation. Lines 33-38 in [`executor.py`](https://github.com/apache/superset/blob/main/executor.py) check `if opts.dry_run` and immediately return an `AsyncQueryHandle` without queuing a Celery task, allowing UI components to validate queries without consuming worker resources.

### Celery Worker Integration

When dry-run is disabled, the query submits to the async queue via `async_query_manager.submit`. The Celery worker eventually executes `load_chart_data_into_cache` from [`superset/tasks/async_queries.py`](https://github.com/apache/superset/blob/main/superset/tasks/async_queries.py), which calls `Database.execute` within a fresh `get_sqla_engine` context. Because each worker process invokes `reset_db_connection_pool` at startup, the async execution environment always initializes with a clean connection pool, preventing "stale connection" errors after database failovers or network interruptions.

## Configuration and Implementation

Implementing optimal connection management requires configuring both the JSON parameters in the database UI and understanding the Python API for programmatic access.

### Configuring Connection Pools via JSON

Database-specific pooling parameters are stored in the **Extra** field of the database configuration UI as JSON. Superset merges these values into `engine_kwargs` before calling `create_engine`. A typical production configuration for a PostgreSQL database with persistent pooling includes:

```json
{
  "engine_params": {
    "pool_size": 10,
    "max_overflow": 20,
    "pool_timeout": 30,
    "pool_pre_ping": true,
    "pool_recycle": 1800
  }
}

```

The `pool_pre_ping` parameter enables connection health checks before checkout, while `pool_recycle` forces connection refresh after 30 minutes to handle scenarios where databases drop idle connections.

### Implementing Async Queries in Python

To execute queries asynchronously from custom views or scripts, use the `Database` model's `execute_async` method with a `QueryOptions` configuration:

```python
from superset.models.core import Database
from superset.sql.types import QueryOptions

# Retrieve database configuration

db = Database.get_by_name("production_warehouse")

# Configure execution options

options = QueryOptions(
    dry_run=False,
    timeout_seconds=300
)

# Submit async query

handle = db.execute_async(
    sql="SELECT * FROM large_table WHERE created_at > now() - interval '1 day'",
    options=options
)

print(f"Job ID: {handle.job_id}")
print(f"Current Status: {handle.get_status()}")

```

The returned `AsyncQueryHandle` provides methods to poll for completion, retrieve results from the cache backend, and check execution status without blocking the calling thread.

### Manual Pool Reset Procedures

For rare scenarios requiring immediate pool invalidation—such as after rotating database credentials or during connection troubleshooting—manually dispose of the engine:

```python
from superset import create_app
from superset.extensions import db

app = create_app()
with app.app_context():
    db.engine.dispose()

```

This operation mimics the automatic behavior in [`superset/tasks/celery_app.py`](https://github.com/apache/superset/blob/main/superset/tasks/celery_app.py), forcing SQLAlchemy to drop all existing connections and recreate the pool on the next database access.

## Summary

- **Connection pooling** in Superset is controlled via the `engine_params` JSON in the Database "Extra" field, parsed by `Database._get_sqla_engine` in [`superset/models/core.py`](https://github.com/apache/superset/blob/main/superset/models/core.py).
- **NullPool** is enforced for UI queries (`nullpool=True`) to prevent connection sharing across HTTP requests, while background tasks can utilize persistent `QueuePool` configurations.
- **Asynchronous execution** flows through `Database.execute_async` to `SQLExecutor.execute_async`, ultimately dispatching to Celery workers via the async query manager.
- **Worker isolation** is maintained through `reset_db_connection_pool` in [`superset/tasks/celery_app.py`](https://github.com/apache/superset/blob/main/superset/tasks/celery_app.py), which calls `db.engine.dispose()` at worker startup to ensure fresh connection pools.
- **Dry-run mode** allows query validation without consuming Celery worker resources, implemented in [`superset/sql/execution/executor.py`](https://github.com/apache/superset/blob/main/superset/sql/execution/executor.py).

## Frequently Asked Questions

### How do I configure connection pooling for a Superset database connection?

Store SQLAlchemy pool parameters in the **Extra** field of your database configuration as a JSON object under the `engine_params` key. Superset merges these parameters into the `create_engine` call within `Database._get_sqla_engine`. Common settings include `pool_size` for base connections, `max_overflow` for burst capacity, and `pool_pre_ping` to verify connection health before use.

### What is the difference between NullPool and standard connection pooling in Superset?

Superset uses `NullPool` (configured via `nullpool=True`) for interactive UI queries to ensure each HTTP request opens and closes its own database connection, preventing cross-request connection contamination. Standard pooling via `QueuePool` maintains persistent connections suitable for Celery workers or high-throughput scenarios where connection reuse reduces latency, configured by setting `nullpool=False` and defining `poolclass` in `engine_params`.

### How does Superset handle database connections in Celery workers?

Each Celery worker process calls `reset_db_connection_pool` during initialization, which executes `db.engine.dispose()` to destroy any inherited engine instances. When the worker subsequently calls `Database.execute` or `get_sqla_engine`, SQLAlchemy creates a fresh engine with the configured pool settings from `engine_params`. This pattern prevents stale connection errors and ensures workers maintain isolated database sessions.

### Can I run queries asynchronously in Superset to prevent UI timeouts?

Yes. Use `Database.execute_async` with `QueryOptions(dry_run=False)` to submit queries to the Celery task queue. The method returns an `AsyncQueryHandle` containing a `job_id` for status polling. The query executes in a background worker using the pooling configuration defined in the database's `engine_params`, while the web UI remains responsive. For validation without execution, set `dry_run=True` to check query syntax and permissions without queuing the task.