How to Use the Advanced Data Types System in Apache Superset for Custom Data Type Handling

Apache Superset's advanced data types system enables developers to register custom column-level types that automatically handle data conversion, validation, and UI rendering by subclassing AdvancedDataType and applying the @register_type decorator.

Apache Superset ships with a powerful advanced data types system that extends beyond standard SQLAlchemy types to support domain-specific data handling. Located primarily in superset/advanced_data_type/types.py, this framework allows you to define custom types that manage everything from database storage formats to frontend display logic, with configuration persisted in the Column model's advanced_data_type JSON field.

Core Architecture of the Advanced Data Types System

The system operates through four coordinated components that bridge the database, backend, and React frontend.

AdvancedDataType Base Class

The AdvancedDataType abstract class in superset/advanced_data_type/types.py defines the contract for all custom types. Subclasses must implement:

  • process_result_value(value, dialect): Converts database values to Python/JSON for UI consumption
  • bind_processor(dialect): Returns a function that transforms Python values to database-compatible formats before storage
  • validate(value): Enforces data integrity constraints and raises ValidationError for invalid inputs
  • get_editor_schema(): Returns JSON Schema to drive dynamic configuration forms in the frontend

AdvancedDataTypeRegistry

The AdvancedDataTypeRegistry singleton maintains a global mapping of type names to concrete classes. The @register_type decorator handles registration automatically, making custom types discoverable throughout the application without manual imports.

Column Metadata Storage

In superset/models/core.py, the Column model stores type configuration in the advanced_data_type JSON field. This persists the selected type name and options (e.g., {"type": "currency", "options": {"symbol": "€"}}), enabling type-aware processing across sessions and queries.

SQLAlchemy Integration

The superset/connectors/sqla/models.py file hooks custom types into the SQLAlchemy result pipeline. When querying data, Superset automatically invokes process_result_value to transform raw values before they reach charts or exports, while bind_processor handles data insertion and updates.

Implementing a Custom Advanced Data Type

Creating a custom type involves subclassing AdvancedDataType, implementing the required methods, and registering the class with the decorator.

Step 1: Create the Type Class

Define your class in a new file (e.g., superset/extensions/currency_type.py):

from decimal import Decimal, ROUND_HALF_UP
from superset.advanced_data_type.types import AdvancedDataType, register_type
from marshmallow import ValidationError

@register_type("currency")
class CurrencyType(AdvancedDataType):
    """Store values as integer cents in DB, display as formatted currency strings."""
    
    @staticmethod
    def get_editor_schema() -> dict:
        """Define the configuration UI schema."""
        return {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "enum": ["$", "€", "£"],
                    "default": "$",
                    "title": "Currency Symbol"
                }
            }
        }

    def bind_processor(self, dialect):
        """Convert "$12.34" → integer cents before storage."""
        symbol = self.options.get("symbol", "$")
        
        def processor(value):
            if value is None:
                return None
            try:
                clean = value.replace(symbol, "").strip()
                dec = (Decimal(clean) * 100).to_integral_value(ROUND_HALF_UP)
                return int(dec)
            except Exception as exc:
                raise ValidationError(f"Invalid currency value: {value}") from exc
        
        return processor
    
    def process_result_value(self, value, dialect):
        """Convert integer cents → formatted string for UI."""
        symbol = self.options.get("symbol", "$")
        if value is None:
            return None
        return f"{symbol}{value / 100:.2f}"
    
    def validate(self, value):
        """Ensure value matches expected format."""
        if not isinstance(value, str):
            raise ValidationError("Currency must be a string")
        if not value.startswith(self.options.get("symbol", "$")):
            raise ValidationError("Currency symbol mismatch")

Step 2: Configure the Column in the UI

Once registered, apply the type through the Superset interface:

  1. Navigate to Data → Datasets and select your dataset.
  2. Click Edit on the target column.
  3. Select "currency" from the Advanced Data Type dropdown.
  4. Choose the desired symbol (e.g., "€") in the configuration form.
  5. Save the column.

Superset persists this configuration as {"type": "currency", "options": {"symbol": "€"}} in the advanced_data_type field of the Column model.

Step 3: Access Types Programmatically

Interact with the registry and processing methods directly for ETL or custom visualization logic:

from superset.models.core import Column
from superset.advanced_data_type.types import AdvancedDataTypeRegistry

# Retrieve column configuration

col = session.query(Column).filter_by(column_name="price").one()
adt_config = col.advanced_data_type

# Returns: {'type': 'currency', 'options': {'symbol': '€'}}

# Resolve the implementation class

adt_class = AdvancedDataTypeRegistry.get(adt_config["type"])
adt_instance = adt_class(options=adt_config.get("options", {}))

# Process raw database values

raw_value = 1234  # Stored as cents

display_value = adt_instance.process_result_value(raw_value, dialect=None)

# Result: "€12.34"

Key Files in the Advanced Data Types System

File Purpose Location
superset/advanced_data_type/types.py Base class AdvancedDataType, registry singleton, and @register_type decorator Source
superset/models/core.py Column model with advanced_data_type JSON field for metadata storage Source
superset/connectors/sqla/models.py SQLAlchemy integration for query-time processing Source
superset-frontend/src/explore/components/PropertiesModal/ColumnEditPopover.tsx React component for type selection and option configuration Source

Summary

  • The advanced data types system in Apache Superset provides a registry-based architecture for custom column-level type handling located in superset/advanced_data_type/types.py.
  • Extend the system by subclassing AdvancedDataType and implementing process_result_value, bind_processor, and validate methods to control data conversion and validation.
  • Register custom types using the @register_type decorator to make them available in the dataset column editor UI.
  • Configuration persists in the Column.advanced_data_type JSON field in superset/models/core.py, storing type names and options.
  • The system automatically integrates with SQLAlchemy queries through superset/connectors/sqla/models.py, applying transformations during data retrieval and storage.

Frequently Asked Questions

How do I register a custom advanced data type in Superset?

Register your class using the @register_type decorator imported from superset.advanced_data_type.types. Pass a unique string identifier as the argument (e.g., @register_type("my_type")). This automatically adds your class to the global AdvancedDataTypeRegistry, making it available in the dataset column editor without requiring manual imports or configuration files.

What methods must I implement when creating a custom AdvancedDataType subclass?

You must implement process_result_value(value, dialect) to convert database values for UI display, and bind_processor(dialect) to return a function that transforms Python values before database storage. Optionally implement validate(value) for data integrity checks and get_editor_schema() to return JSON Schema that drives dynamic configuration forms in the React frontend.

Where does Superset store advanced data type configuration?

Superset stores the configuration in the advanced_data_type JSON column of the Column model, defined in superset/models/core.py. This field contains the type name and any options (e.g., {"type": "currency", "options": {"symbol": "$"}}), which both the backend processors and frontend components reference to apply transformations and render appropriate controls.

How does the advanced data types system integrate with SQLAlchemy queries?

The integration occurs in superset/connectors/sqla/models.py, where Superset hooks your custom type's processors into the SQLAlchemy result pipeline. When querying data, process_result_value automatically transforms raw database values before they reach charts or exports, while bind_processor handles data insertion and updates, ensuring consistent type handling throughout the data lifecycle.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →