# How to Work with the Semantic Layer: Calculated Columns, Custom Metrics, and Virtual Datasets in Apache Superset

> Master Apache Superset's semantic layer. Learn to build calculated columns, custom metrics, and virtual datasets for powerful data insights and reusable data models.

- Repository: [The Apache Software Foundation/superset](https://github.com/apache/superset)
- Tags: how-to-guide
- Published: 2026-03-03

---

**Apache Superset's semantic layer uses Dataset objects to abstract raw database tables into reusable entities that support calculated columns (SQL expressions), custom metrics (reusable aggregations), and virtual datasets (SQL queries), all stored as SQLAlchemy models and processed through Jinja templating before query execution.**

Apache Superset's semantic layer bridges the gap between raw database tables and end-user visualizations through Dataset objects that encapsulate business logic. This layer enables data teams to define calculated columns, reusable metrics, and virtual datasets directly within the platform without modifying source databases. Understanding how these components are stored, rendered, and resolved in the source code is essential for building scalable analytics workflows.

## Understanding the Semantic Layer Architecture

In Superset, the semantic layer is implemented through the `SqlaTable` class in [`superset/connectors/sqla/models.py`](https://github.com/apache/superset/blob/main/superset/connectors/sqla/models.py). This class serves as the central abstraction between physical databases and charts, supporting three primary extension points:

- **Calculated columns**: Virtual columns defined by SQL expressions that do not exist in the source table
- **Custom metrics**: Reusable aggregation definitions (e.g., `SUM(revenue)`) attached to the Dataset
- **Virtual datasets**: Datasets defined entirely by a SQL query rather than a physical table reference

Each element persists within the Dataset's SQLAlchemy relationships and participates in the query building pipeline.

## Calculated Columns: Adding Virtual Columns to Physical Tables

Calculated columns allow analysts to define SQL expressions that transform or derive data without altering the underlying database schema. According to the Apache Superset source code, these are stored as `TableColumn` objects with a populated `expression` attribute.

When a Dataset refreshes its metadata, the `SqlaTable.refresh()` method preserves calculated columns by filtering for those with expressions:

```python

# From superset/connectors/sqla/models.py lines 1829-1831

columns.extend([col for col in old_columns if col.expression])

```

This ensures that physical columns from the database schema merge with user-defined calculated columns. The `expression` field contains raw SQL that Superset injects into queries when the column is referenced.

## Custom Metrics: Creating Reusable Aggregations

Custom metrics (`SqlMetric` objects) provide consistent aggregation logic across multiple charts. Stored on the Dataset via the `self.metrics` relationship, metrics survive table refreshes through the `add_missing_metrics()` helper:

```python

# From superset/connectors/sqla/models.py lines 86-92

def add_missing_metrics(self, metrics: list[SqlMetric]) -> None:
    """Merge missing metrics into the dataset."""
    existing = {m.metric_name for m in self.metrics}
    for metric in metrics:
        if metric.metric_name not in existing:
            self.metrics.append(metric)

```

This persistence mechanism ensures that business-critical KPIs remain attached to Datasets even after schema updates.

## Virtual Datasets: Query-Based Data Sources

Virtual datasets bypass physical table constraints entirely, deriving data from a SQL query stored in the [`self.sql`](https://github.com/apache/superset/blob/main/self.sql) attribute. The `is_virtual` property identifies these Datasets:

```python

# From superset/connectors/sqla/models.py lines 307-309

@property
def is_virtual(self) -> bool:
    return self.kind == DatasourceKind.VIRTUAL

```

Because virtual datasets execute arbitrary SQL, they require special cache invalidation logic. The `get_extra_cache_keys()` method incorporates row-level security (RLS) predicates into cache keys for virtual datasets:

```python

# From superset/connectors/sqla/models.py lines 87-97

if self.is_virtual and self.sql:
    rls_predicates = collect_rls_predicates_for_sql(...)
    extra_cache_keys.extend(rls_predicates)

```

This ensures that users with different RLS rules never share cached results for the same virtual dataset query.

## Rendering Jinja-Templated Expressions

All three semantic layer elements support Jinja templating for dynamic SQL generation. The REST API processes these templates in `DatasetDAO.render_dataset_fields()` within [`superset/datasets/api.py`](https://github.com/apache/superset/blob/main/superset/datasets/api.py):

```python

# From superset/datasets/api.py lines 1397-1400

items = [
    ("query", "sql", "rendered_sql", processor.process_template),
    ("metric", "metrics", "metrics", render_item_list),
    ("calculated column", "columns", "columns", render_item_list),
]

```

The `TemplateProcessor` evaluates Jinja syntax in SQL expressions, metrics, and calculated columns before query execution. If rendering fails, the system raises a `SupersetTemplateException` to surface errors in the UI.

## Query Resolution and Execution

When building queries, Superset must distinguish between physical columns, calculated columns, and custom metrics. The `has_extra_cache_key_calls()` method in [`superset/connectors/sqla/models.py`](https://github.com/apache/superset/blob/main/superset/connectors/sqla/models.py) resolves these references by building a dictionary of calculated expressions:

```python
calculated_columns = {
    c.column_name: c.expression for c in self.columns if c.expression
}
for column_ in columns:
    if utils.is_adhoc_column(column_):
        templatable_statements.append(column_["sqlExpression"])
    elif isinstance(column_, str) and column_ in calculated_columns:
        templatable_statements.append(calculated_columns[column_])

```

This resolution ensures that calculated column expressions and metric definitions inject correctly into the final SQL before database execution.

## Practical Implementation Examples

### Creating a Calculated Column via REST API

```python
import json, requests

payload = {
    "column_name": "order_month",
    "type": "STRING",
    "expression": "DATE_FORMAT(order_date, '%Y-%m')",
}
response = requests.post(
    "http://localhost:8088/api/v1/dataset/42/columns/",
    headers={"Authorization": "Bearer <TOKEN>", "Content-Type": "application/json"},
    data=json.dumps(payload),
)
print(response.json())

```

### Defining a Custom Metric

```python
metric = {
    "metric_name": "total_sales_by_month",
    "expression": "SUM(sales)",
    "verbose_name": "Total Sales (by month)",
}
requests.post(
    "http://localhost:8088/api/v1/dataset/42/metrics/",
    headers={"Authorization": "Bearer <TOKEN>", "Content-Type": "application/json"},
    data=json.dumps(metric),
)

```

### Creating a Virtual Dataset

```python
virtual_sql = """
SELECT
    DATE_FORMAT(order_date, '%Y-%m') AS order_month,
    SUM(sales) AS total_sales
FROM raw_orders
GROUP BY 1
"""

# POST to /api/v1/dataset/ with "sql": virtual_sql and "is_virtual": true

```

### Using Semantic Layer Elements in Queries

```sql
SELECT
    {{ order_month }} AS month,
    SUM({{ total_sales_by_month }}) AS revenue
FROM "virtual_dataset"
GROUP BY 1

```

## Summary

- **Calculated columns** persist as `TableColumn` objects with `expression` attributes and survive table refreshes through `SqlaTable.refresh()` in [`superset/connectors/sqla/models.py`](https://github.com/apache/superset/blob/main/superset/connectors/sqla/models.py).
- **Custom metrics** store as `SqlMetric` objects merged via `add_missing_metrics()`, providing reusable aggregation logic across charts.
- **Virtual datasets** identify via `is_virtual` (where `kind == DatasourceKind.VIRTUAL`) and require RLS-aware cache keys through `get_extra_cache_keys()`.
- **Jinja templating** processes uniformly across all three element types in `DatasetDAO.render_dataset_fields()` within the datasets API.
- Query resolution distinguishes between physical and calculated references through `has_extra_cache_key_calls()`, injecting expressions into final SQL.

## Frequently Asked Questions

### How do calculated columns differ from custom metrics in Superset?

**Calculated columns** define SQL expressions that create new column values (like `DATE_FORMAT(order_date, '%Y-%m')`), while **custom metrics** define aggregations applied to columns (like `SUM(sales)`). Calculated columns appear as selectable dimensions in the Explore view, whereas metrics appear as measurable values. Under the hood, calculated columns use the `TableColumn` model with an `expression` field, while metrics use the `SqlMetric` model.

### Do virtual datasets support row-level security (RLS)?

Yes. According to the source code in [`superset/connectors/sqla/models.py`](https://github.com/apache/superset/blob/main/superset/connectors/sqla/models.py), virtual datasets automatically incorporate RLS predicates into their cache keys via `get_extra_cache_keys()`. When `self.is_virtual` is true and [`self.sql`](https://github.com/apache/superset/blob/main/self.sql) is defined, the system calls `collect_rls_predicates_for_sql()` to append security filters to the cache key calculation, ensuring users with different permissions never share cached results.

### Can I use Jinja templating in calculated columns and metrics?

Yes. Superset supports Jinja templating across all semantic layer elements. The `DatasetDAO.render_dataset_fields()` method in [`superset/datasets/api.py`](https://github.com/apache/superset/blob/main/superset/datasets/api.py) processes templates for queries, metrics, and calculated columns using the `TemplateProcessor`. This allows dynamic referencing of filters, user attributes, and macro functions within your SQL expressions.

### What happens to my calculated columns when the underlying table schema changes?

Calculated columns persist through schema changes. When you refresh a Dataset's metadata via `SqlaTable.refresh()` (triggered manually or automatically), the system extends the new physical column list with existing calculated columns that have an `expression` attribute set. This merge logic at lines 1829-1831 in [`superset/connectors/sqla/models.py`](https://github.com/apache/superset/blob/main/superset/connectors/sqla/models.py) ensures your virtual columns survive schema updates, though you should verify that expressions remain valid against the updated physical structure.