How to Use the Advanced Data Types System in Apache Superset for Custom Data Type Handling
Apache Superset's advanced data types system enables developers to register custom column-level types that automatically handle data conversion, validation, and UI rendering by subclassing AdvancedDataType and applying the @register_type decorator.
Apache Superset ships with a powerful advanced data types system that extends beyond standard SQLAlchemy types to support domain-specific data handling. Located primarily in superset/advanced_data_type/types.py, this framework allows you to define custom types that manage everything from database storage formats to frontend display logic, with configuration persisted in the Column model's advanced_data_type JSON field.
Core Architecture of the Advanced Data Types System
The system operates through four coordinated components that bridge the database, backend, and React frontend.
AdvancedDataType Base Class
The AdvancedDataType abstract class in superset/advanced_data_type/types.py defines the contract for all custom types. Subclasses must implement:
process_result_value(value, dialect): Converts database values to Python/JSON for UI consumptionbind_processor(dialect): Returns a function that transforms Python values to database-compatible formats before storagevalidate(value): Enforces data integrity constraints and raisesValidationErrorfor invalid inputsget_editor_schema(): Returns JSON Schema to drive dynamic configuration forms in the frontend
AdvancedDataTypeRegistry
The AdvancedDataTypeRegistry singleton maintains a global mapping of type names to concrete classes. The @register_type decorator handles registration automatically, making custom types discoverable throughout the application without manual imports.
Column Metadata Storage
In superset/models/core.py, the Column model stores type configuration in the advanced_data_type JSON field. This persists the selected type name and options (e.g., {"type": "currency", "options": {"symbol": "€"}}), enabling type-aware processing across sessions and queries.
SQLAlchemy Integration
The superset/connectors/sqla/models.py file hooks custom types into the SQLAlchemy result pipeline. When querying data, Superset automatically invokes process_result_value to transform raw values before they reach charts or exports, while bind_processor handles data insertion and updates.
Implementing a Custom Advanced Data Type
Creating a custom type involves subclassing AdvancedDataType, implementing the required methods, and registering the class with the decorator.
Step 1: Create the Type Class
Define your class in a new file (e.g., superset/extensions/currency_type.py):
from decimal import Decimal, ROUND_HALF_UP
from superset.advanced_data_type.types import AdvancedDataType, register_type
from marshmallow import ValidationError
@register_type("currency")
class CurrencyType(AdvancedDataType):
"""Store values as integer cents in DB, display as formatted currency strings."""
@staticmethod
def get_editor_schema() -> dict:
"""Define the configuration UI schema."""
return {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"enum": ["$", "€", "£"],
"default": "$",
"title": "Currency Symbol"
}
}
}
def bind_processor(self, dialect):
"""Convert "$12.34" → integer cents before storage."""
symbol = self.options.get("symbol", "$")
def processor(value):
if value is None:
return None
try:
clean = value.replace(symbol, "").strip()
dec = (Decimal(clean) * 100).to_integral_value(ROUND_HALF_UP)
return int(dec)
except Exception as exc:
raise ValidationError(f"Invalid currency value: {value}") from exc
return processor
def process_result_value(self, value, dialect):
"""Convert integer cents → formatted string for UI."""
symbol = self.options.get("symbol", "$")
if value is None:
return None
return f"{symbol}{value / 100:.2f}"
def validate(self, value):
"""Ensure value matches expected format."""
if not isinstance(value, str):
raise ValidationError("Currency must be a string")
if not value.startswith(self.options.get("symbol", "$")):
raise ValidationError("Currency symbol mismatch")
Step 2: Configure the Column in the UI
Once registered, apply the type through the Superset interface:
- Navigate to Data → Datasets and select your dataset.
- Click Edit on the target column.
- Select "currency" from the Advanced Data Type dropdown.
- Choose the desired symbol (e.g., "€") in the configuration form.
- Save the column.
Superset persists this configuration as {"type": "currency", "options": {"symbol": "€"}} in the advanced_data_type field of the Column model.
Step 3: Access Types Programmatically
Interact with the registry and processing methods directly for ETL or custom visualization logic:
from superset.models.core import Column
from superset.advanced_data_type.types import AdvancedDataTypeRegistry
# Retrieve column configuration
col = session.query(Column).filter_by(column_name="price").one()
adt_config = col.advanced_data_type
# Returns: {'type': 'currency', 'options': {'symbol': '€'}}
# Resolve the implementation class
adt_class = AdvancedDataTypeRegistry.get(adt_config["type"])
adt_instance = adt_class(options=adt_config.get("options", {}))
# Process raw database values
raw_value = 1234 # Stored as cents
display_value = adt_instance.process_result_value(raw_value, dialect=None)
# Result: "€12.34"
Key Files in the Advanced Data Types System
| File | Purpose | Location |
|---|---|---|
superset/advanced_data_type/types.py |
Base class AdvancedDataType, registry singleton, and @register_type decorator |
Source |
superset/models/core.py |
Column model with advanced_data_type JSON field for metadata storage |
Source |
superset/connectors/sqla/models.py |
SQLAlchemy integration for query-time processing | Source |
superset-frontend/src/explore/components/PropertiesModal/ColumnEditPopover.tsx |
React component for type selection and option configuration | Source |
Summary
- The advanced data types system in Apache Superset provides a registry-based architecture for custom column-level type handling located in
superset/advanced_data_type/types.py. - Extend the system by subclassing
AdvancedDataTypeand implementingprocess_result_value,bind_processor, andvalidatemethods to control data conversion and validation. - Register custom types using the
@register_typedecorator to make them available in the dataset column editor UI. - Configuration persists in the
Column.advanced_data_typeJSON field insuperset/models/core.py, storing type names and options. - The system automatically integrates with SQLAlchemy queries through
superset/connectors/sqla/models.py, applying transformations during data retrieval and storage.
Frequently Asked Questions
How do I register a custom advanced data type in Superset?
Register your class using the @register_type decorator imported from superset.advanced_data_type.types. Pass a unique string identifier as the argument (e.g., @register_type("my_type")). This automatically adds your class to the global AdvancedDataTypeRegistry, making it available in the dataset column editor without requiring manual imports or configuration files.
What methods must I implement when creating a custom AdvancedDataType subclass?
You must implement process_result_value(value, dialect) to convert database values for UI display, and bind_processor(dialect) to return a function that transforms Python values before database storage. Optionally implement validate(value) for data integrity checks and get_editor_schema() to return JSON Schema that drives dynamic configuration forms in the React frontend.
Where does Superset store advanced data type configuration?
Superset stores the configuration in the advanced_data_type JSON column of the Column model, defined in superset/models/core.py. This field contains the type name and any options (e.g., {"type": "currency", "options": {"symbol": "$"}}), which both the backend processors and frontend components reference to apply transformations and render appropriate controls.
How does the advanced data types system integrate with SQLAlchemy queries?
The integration occurs in superset/connectors/sqla/models.py, where Superset hooks your custom type's processors into the SQLAlchemy result pipeline. When querying data, process_result_value automatically transforms raw database values before they reach charts or exports, while bind_processor handles data insertion and updates, ensuring consistent type handling throughout the data lifecycle.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →