How Apache Superset's Caching Layer Works: Configure Result Backend and Query Caching

Apache Superset uses Flask-Caching for metadata and query results, while a separate Result Backend stores asynchronous SQL-Lab payloads; both are configured via superset_config.py using keys like CACHE_CONFIG, DATA_CACHE_CONFIG, and RESULTS_BACKEND.

Apache Superset's caching layer is built on Flask-Caching to accelerate dashboard rendering and query performance. The system distinguishes between in-process caches for metadata and query results, and a dedicated Result Backend for asynchronous SQL-Lab execution. Understanding how to configure Superset caching properly requires editing the Flask application configuration in superset_config.py, where default values are defined in superset/config.py and instantiated by superset/utils/cache_manager.py.

Cache Architecture Overview

Superset implements a dual-layer caching strategy. Flask-Caching handles all in-process caches—including metadata, filter states, explore form-data, and thumbnails—while the Results Backend specifically manages payload storage for asynchronous SQL-Lab queries executed via Celery workers.

The following configuration keys control each cache purpose:

Cache purpose Config key (default) Typical backend
General UI cache CACHE_CONFIG NullCache (no caching)
Query result cache (SQL-Lab) DATA_CACHE_CONFIG NullCache (no caching)
Filter-state cache (dashboard) FILTER_STATE_CACHE_CONFIG SimpleCache
Explore form-data cache EXPLORE_FORM_DATA_CACHE_CONFIG SimpleCache
Dashboard thumbnails THUMBNAIL_CACHE_CONFIG RedisCache (optional)
Async query payloads RESULTS_BACKEND None (must be set)
Msgpack toggle for results RESULTS_BACKEND_USE_MSGPACK True

How the Cache Manager Initializes Caches

When Superset starts, the CacheManager class in superset/utils/cache_manager.py reads the configuration keys and creates a Flask-Cache instance for each cache type. The initialization method _init_cache validates that required configurations exist, raising explicit errors if critical cache setups are broken.


# superset/utils/cache_manager.py (excerpt)

self._init_cache(app, self._cache, "CACHE_CONFIG")
self._init_cache(app, self._data_cache, "DATA_CACHE_CONFIG")
self._init_cache(app, self._thumbnail_cache, "THUMBNAIL_CACHE_CONFIG")

If a required configuration is missing, the manager prevents the application from running with an invalid cache state, ensuring reliability in production environments.

Configuring the Result Backend for Async Queries

The Result Backend is essential for the SQL-Lab async executor located in superset/sql/execution/celery_task.py. It stores large result sets outside the request thread, allowing Celery workers to stream data back to the UI without blocking the web server.

In superset/config.py at line 1516, the RESULTS_BACKEND defaults to None and must be explicitly configured in your superset_config.py. The backend must be a Flask-Cache compatible object, commonly RedisCache or FileSystemCache.


# superset/config.py (lines 1516-1522)

RESULTS_BACKEND: CacheConfig | None = None
RESULTS_BACKEND_USE_MSGPACK: bool = True

The RESULTS_BACKEND_USE_MSGPACK toggle at line 1522 controls whether payloads are serialized with msgpack for compactness and performance. During runtime, the code accesses the backend via the extension registry in superset/extensions/__init__.py:


# superset/extensions/__init__.py (excerpt)

self._results_backend = app.config["RESULTS_BACKEND"]
self._use_msgpack = app.config["RESULTS_BACKEND_USE_MSGPACK"]

If the backend is not configured when running async queries, Superset raises SupersetErrorType.RESULTS_BACKEND_NOT_CONFIGURED_ERROR as defined in the error handling logic.

Query Caching Flow and Implementation

The Superset caching layer follows a specific execution path for SQL-Lab requests:

  1. The Flask view checks DATA_CACHE_CONFIG["CACHE_DEFAULT_TIMEOUT"] to determine cache validity.
  2. If a cached entry exists—built from a key comprising the SQL query, user ID, datasource, and other parameters—Superset returns it immediately. This key generation logic resides in superset/common/query_context_processor.py at line 389.
  3. If no cache hit occurs, the query dispatches to Celery (if async) or executes synchronously.
  4. Upon completion, the result set is saved to RESULTS_BACKEND (for async execution) and simultaneously written to DATA_CACHE_CONFIG for rapid future lookups.

Production Configuration Examples

Create or edit superset_config.py in your Superset home directory to enable Redis-backed caching:


# superset_config.py

# General UI cache – 1 hour, Redis backend

CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_REDIS_URL": "redis://localhost:6379/0",
    "CACHE_DEFAULT_TIMEOUT": 3600,
    "CACHE_KEY_PREFIX": "superset_ui_",
}

# Query result cache – 30 seconds, same Redis instance

DATA_CACHE_CONFIG = {
    **CACHE_CONFIG,
    "CACHE_DEFAULT_TIMEOUT": 30,
    "CACHE_KEY_PREFIX": "superset_data_",
}

# Async query result payloads (Result Backend)

RESULTS_BACKEND = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_REDIS_URL": "redis://localhost:6379/1",
    "CACHE_KEY_PREFIX": "superset_results_",
}
RESULTS_BACKEND_USE_MSGPACK = True

Restart Superset after modifying the configuration. If enabling asynchronous execution, configure Celery to use the same Redis broker:

CELERY_CONFIG = {
    "broker_url": "redis://localhost:6379/2",
    "result_backend": "redis://localhost:6379/2",
    "task_ignore_result": True,
}

Programmatic Cache Access

Accessing the Data Cache in a View

Reference superset/utils/cache_manager.py to retrieve cached dataframe objects programmatically:

from superset.extensions import cache_manager

def get_cached_dataframe(cache_key):
    # Returns None if the key is missing

    return cache_manager.data_cache.get(cache_key)

Storing a Result Set in the Result Backend

When handling async job results manually, use the results backend with msgpack serialization as implemented in superset/extensions/__init__.py:

def store_async_result(job_id, result_set):
    backend = cache_manager.results_backend
    key = f"async_result_{job_id}"
    # Use msgpack when configured

    if cache_manager.use_msgpack:
        payload = msgpack.packb(result_set, use_bin_type=True)
    else:
        payload = json.dumps(result_set)
    backend.set(key, payload, timeout=3600)  # 1 hour TTL

Overriding Cache Timeout for a Specific Query

Temporarily adjust the cache duration for individual queries by modifying the runtime configuration:


# Inside a view or command

app.config["DATA_CACHE_CONFIG"]["CACHE_DEFAULT_TIMEOUT"] = 120  # 2 min

# Run the query …

# After execution, Superset automatically respects this timeout when caching.

Summary

  • Flask-Caching powers all in-process Superset caches via CACHE_CONFIG, DATA_CACHE_CONFIG, and specialized keys for thumbnails and form data.
  • The Result Backend (RESULTS_BACKEND) is a separate cache specifically for async SQL-Lab payloads, configured in superset/config.py at lines 1516-1522.
  • Cache initialization occurs in superset/utils/cache_manager.py, which validates configurations and instantiates Flask-Cache objects.
  • Query cache keys are generated in superset/common/query_context_processor.py using SQL, user context, and datasource parameters.
  • Enable msgpack serialization for the Result Backend using RESULTS_BACKEND_USE_MSGPACK to reduce payload size.

Frequently Asked Questions

What is the difference between DATA_CACHE_CONFIG and RESULTS_BACKEND?

DATA_CACHE_CONFIG caches query results for fast retrieval during synchronous requests, while RESULTS_BACKEND specifically stores large payloads from asynchronous SQL-Lab queries executed by Celery workers. The Result Backend acts as temporary storage for data that cannot be held in the web server's memory during long-running queries.

How do I enable caching for SQL-Lab queries?

Set DATA_CACHE_CONFIG in your superset_config.py to a valid Flask-Cache backend such as RedisCache or MemcachedCache. Ensure you define CACHE_DEFAULT_TIMEOUT within the dictionary to control how long results persist. For asynchronous queries, you must also configure RESULTS_BACKEND with a compatible cache storage.

Why am I getting RESULTS_BACKEND_NOT_CONFIGURED_ERROR?

This error occurs when attempting to run asynchronous SQL-Lab queries without defining RESULTS_BACKEND in your configuration. The async executor in superset/sql/execution/celery_task.py requires this backend to store query payloads. Set RESULTS_BACKEND to a RedisCache or FileSystemCache configuration dictionary to resolve this error.

Can I use different backends for different cache types?

Yes. Each configuration key (CACHE_CONFIG, DATA_CACHE_CONFIG, THUMBNAIL_CACHE_CONFIG, etc.) accepts independent Flask-Cache configurations. You can use SimpleCache for local development, RedisCache for production query results, and FileSystemCache for thumbnails, depending on your infrastructure requirements and performance characteristics.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →