How the SQL Lab Query Execution Pipeline Works in Apache Superset: Architecture and Custom Executor Guide

The SQL Lab query execution pipeline uses a command-executor pattern where ExecuteSqlCommand orchestrates query validation, Jinja rendering, and delegates to either a synchronous or asynchronous SqlJsonExecutor implementation based on the runAsync flag.

The Apache Superset SQL Lab query execution pipeline is designed around clean separation of concerns, allowing developers to inject custom logic without modifying core internals. Whether you need to audit queries, route them through a sandbox, or integrate a distributed processing engine, understanding this architecture is essential for extending the apache/superset codebase.

Architecture of the SQL Lab Query Execution Pipeline

The pipeline follows a command pattern that decouples HTTP handling from business logic and database interaction. Each query passes through context construction, command execution, and result conversion before returning to the frontend.

Entry Point: REST API and Execution Context

The journey begins at SqlLabRestApi.execute_sql_query in superset/sqllab/api.py. This POST /api/v1/sqllab/execute/ endpoint validates the incoming JSON payload and constructs a SqlJsonExecutionContext object.

The context object, defined in superset/sqllab/sqllab_execution_context.py, encapsulates all request metadata: the target database connection, user identity, row limits, CTAS (Create Table As Select) settings, and query parameters. This context travels through the entire pipeline, ensuring all downstream components have access to the original request state.


# superset/sqllab/api.py (lines 495-505, 669-690)

def execute_sql_query(self) -> Response:
    execution_context = self._create_sql_json_command()
    command = ExecuteSqlCommand(
        execution_context=execution_context,
        # ... other injected dependencies

    )
    return command.run()

The Command Pattern: ExecuteSqlCommand

The ExecuteSqlCommand class in superset/commands/sql_lab/execute.py serves as the central orchestrator. Its run method coordinates the entire lifecycle:

  1. Query deduplication – Checks for existing Query records via _try_get_existing_query.
  2. Database resolution – Validates the database connection via _get_the_query_db.
  3. Persistence – Saves the new query record via _save_new_query.
  4. Access control – Validates RBAC permissions via _validate_access.
  5. Template rendering – Processes Jinja syntax via _sql_query_render.render.
  6. Limit injection – Applies row limits via _set_query_limit_if_required.
  7. Execution – Delegates to the selected SqlJsonExecutor.

According to the source code in execute.py (lines 94-108), the command returns a dictionary containing a status key (e.g., HAS_RESULTS or QUERY_IS_RUNNING) and a payload containing the actual data or job tracking information.

Executor Selection: Synchronous vs Asynchronous

The pipeline supports two execution modes selected in SqlLabRestApi._create_sql_json_executor (lines 691-702):

  • Synchronous execution (SynchronousSqlJsonExecutor): Runs the query in-process with a configurable timeout. Best for lightweight queries that return quickly.
  • Asynchronous execution (ASynchronousSqlJsonExecutor): Offloads the work to a Celery worker, returning immediately with a 202 Accepted status. Required when runAsync is true or the feature flag SQLLAB_FORCE_RUN_ASYNC is enabled.

Both implementations inherit from SqlJsonExecutorBase in superset/sqllab/sql_json_executer.py (lines 61-68). They share common error handling logic that translates raw database driver exceptions into Superset's SupersetError hierarchy.

Result Conversion and Response Handling

After execution completes, the ExecutionContextConvertor class (defined in superset/sqllab/execution_context_convertor.py) transforms raw database results into the JSON structure expected by the SQL Lab frontend. This includes applying the DISPLAY_MAX_ROW limit to prevent massive payloads from reaching the browser.

The API layer then maps the command's return value to HTTP semantics: 200 OK for completed queries and 202 Accepted for asynchronous jobs that are still processing.

Extending the Pipeline with Custom Executors

Because Superset uses dependency injection for its command and executor objects, you can introduce custom execution logic by implementing the SqlJsonExecutorBase interface and registering it in the factory method.

Implementing a Custom Executor

Create a subclass of SqlJsonExecutorBase and implement the execute method. The contract requires accepting (execution_context, rendered_query, log_params) and returning a SqlJsonExecutionStatus. Raise SupersetErrorException or SupersetErrorsException for known failure conditions.

Here is a minimal example that logs queries to an external audit store before returning dummy data:


# superset/sqllab/custom_executor.py

import logging
from superset.sqllab.sql_json_executer import SqlJsonExecutorBase
from superset.sqllab.command_status import SqlJsonExecutionStatus
from superset.exceptions import SupersetGenericDBErrorException

class LoggingExecutor(SqlJsonExecutorBase):
    """Executes queries through an external audit logger before returning static results."""
    
    def execute(self, execution_context, rendered_query, log_params):
        logger = logging.getLogger(__name__)
        logger.info("Audit log: %s", rendered_query)
        
        # Simulate successful execution

        fake_result = {
            "status": "success",
            "data": {"columns": ["audit_col"], "rows": [[1]]},
            "query_id": execution_context.query.id,
        }
        execution_context.set_execution_result(fake_result)
        return SqlJsonExecutionStatus.HAS_RESULTS

Registering Your Executor in the API Layer

To activate your executor, modify SqlLabRestApi._create_sql_json_executor in superset/sqllab/api.py. Replace or extend the conditional logic to instantiate your class when specific criteria are met:


# In superset/sqllab/api.py

from superset.sqllab.custom_executor import LoggingExecutor

@staticmethod
def _create_sql_json_executor(
    execution_context: SqlJsonExecutionContext, 
    query_dao: QueryDAO
) -> SqlJsonExecutor:
    # Custom logic: check for a database-specific flag

    if getattr(execution_context.database, "use_logging_executor", False):
        return LoggingExecutor(query_dao, get_sql_results)
    
    # Standard fallback logic

    if execution_context.is_run_asynchronous():
        return ASynchronousSqlJsonExecutor(query_dao, get_sql_results)
    
    return SynchronousSqlJsonExecutor(
        query_dao,
        get_sql_results,
        app.config.get("SQLLAB_TIMEOUT"),
        is_feature_enabled("SQLLAB_BACKEND_PERSISTENCE"),
    )

Triggering Custom Execution via Database Flags

You can control executor selection per-database by storing configuration in the Database.extra_json field or a custom column. Access this metadata through execution_context.database in the factory method above. When a user submits a query via the standard JSON payload:

{
  "database_id": 5,
  "sql": "SELECT * FROM large_table",
  "runAsync": false,
  "schema": "public"
}

If database 5 has use_logging_executor=True, the pipeline automatically routes the query through your LoggingExecutor instead of the default synchronous or asynchronous implementations.

Summary

  • The SQL Lab query execution pipeline in Apache Superset follows a command-executor pattern that cleanly separates HTTP handling, business logic, and database interaction.
  • ExecuteSqlCommand orchestrates the flow, handling validation, Jinja rendering, and limit injection before delegating to an executor.
  • Executor selection happens in SqlLabRestApi._create_sql_json_executor, choosing between SynchronousSqlJsonExecutor and ASynchronousSqlJsonExecutor based on the runAsync flag.
  • Custom executors must inherit from SqlJsonExecutorBase, implement the execute method, and be registered in the API factory method to override default behavior.
  • The architecture supports per-database routing by checking attributes on execution_context.database, enabling targeted extensions without global changes.

Frequently Asked Questions

What is the difference between synchronous and asynchronous execution in SQL Lab?

Synchronous execution runs queries within the web server process using SynchronousSqlJsonExecutor, subject to the SQLLAB_TIMEOUT configuration. Asynchronous execution uses ASynchronousSqlJsonExecutor to dispatch work to a Celery worker via a message broker, returning a 202 status immediately while the query runs in the background. Asynchronous mode is required for long-running queries that might exceed HTTP timeout limits or when the SQLLAB_FORCE_RUN_ASYNC feature flag is enabled.

How does SQL Lab handle Jinja templating before query execution?

Before the executor sends SQL to the database, ExecuteSqlCommand calls _sql_query_render.render (implemented in superset/sqllab/query_render.py) to process Jinja2 syntax. This allows users to reference variables, macros, and cached data within their queries. The rendered query string is then passed to the executor's execute method as the rendered_query parameter.

Can I use a custom executor for specific databases only?

Yes. Since SqlJsonExecutionContext exposes the database object, you can inspect execution_context.database.extra_json or custom columns in _create_sql_json_executor. Return your custom executor subclass only when specific database flags are present, otherwise fall back to the standard synchronous or asynchronous implementations. This allows you to route, for example, all queries to a specific data warehouse through a custom caching or sandbox layer.

Where are query results stored during asynchronous execution?

When using ASynchronousSqlJsonExecutor, the Celery worker executes the query and stores results temporarily in the Superset results backend (configured via RESULTS_BACKEND in your configuration file). The web server returns a query ID to the frontend, which polls for completion. Once finished, the ExecutionContextConvertor retrieves the results from the backend, applies the DISPLAY_MAX_ROW limit, and streams them to the browser.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →