# How to Convert Multiple Columns to datetime in pandas: Handling Heterogeneous Formats

> Convert multiple columns to datetime in pandas efficiently. Learn to handle heterogeneous formats by vectorizing or normalizing strings for seamless date parsing in pandas.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-20

---

**Use `pd.to_datetime()` to vectorize the assembly of date components from multiple columns, or normalize heterogeneous string formats individually before combining them into a single ISO-8601 string for bulk parsing.**

When working with temporal data in the pandas-dev/pandas repository, you often encounter datasets where dates and times are split across multiple columns or stored in inconsistent textual formats. Learning how to convert multiple columns to datetime in pandas efficiently requires understanding the vectorized assembly capabilities of `pd.to_datetime()` and the optimal strategies for handling heterogeneous date and time formats without resorting to slow Python loops.

## Vectorized Assembly from Component Columns

The most efficient method for converting multiple columns to datetime in pandas occurs when your DataFrame contains canonical temporal components. When you pass a DataFrame directly to `pd.to_datetime()`, pandas automatically identifies columns named `year`, `month`, `day`, `hour`, `minute`, `second`, and `microsecond` to assemble a datetime vector.

According to the pandas source code in [[`pandas/core/tools/datetimes.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py)](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py), specifically around lines 893-898, this DataFrame input dispatches to the `_assemble_from_unit_mappings` function. This implementation extracts the component fields and constructs a `Timestamp` for every row using vectorized C-level operations, achieving **O(N)** time complexity with minimal memory overhead.

```python
import pandas as pd

df = pd.DataFrame({
    "year":  [2021, 2022, 2023],
    "month": [12, 1, 6],
    "day":   [31, 15, 20],
    "hour":  [23, 8, 14],
    "minute":[45, 30, 0]
})

# Vectorized assembly from components

df["timestamp"] = pd.to_datetime(df)
print(df[["timestamp"]])

```

This approach handles missing time components by defaulting to midnight and automatically manages timezone-naive datetime construction.

## Handling Different Date and Time Formats Across Columns

Real-world datasets often store temporal information in heterogeneous string formats across multiple columns—for example, dates in `dd/mm/yyyy` format and times in 12-hour `HH:MM AM/PM` format. To convert these multiple columns to datetime in pandas efficiently, you should normalize each column individually before combining them into a single parseable string.

### Normalizing Column-Specific Formats

Parse each column separately using column-specific format strings to avoid the expensive mixed-format inference mode. This leverages the fast path in `pandas/_libs/tslibs/parsing.pyx` for deterministic format parsing.

```python
import pandas as pd

df = pd.DataFrame({
    "date_str": ["31/12/2020", "01-02-2021"],      # dd/mm/yyyy vs mm-dd-yyyy

    "time_str": ["02:45 PM", "14:30"]              # 12-hour vs 24-hour

})

# Normalize dates with specific formats

df["date"] = pd.to_datetime(df["date_str"],
                            format="%d/%m/%Y",
                            errors="coerce")

# Handle secondary pattern for remaining NaT values

mask = df["date"].isna()
df.loc[mask, "date"] = pd.to_datetime(df.loc[mask, "date_str"],
                                      format="%m-%d-%Y",
                                      errors="coerce")

# Normalize times

df["time"] = pd.to_datetime(df["time_str"],
                            format="%I:%M %p",
                            errors="coerce").dt.time

```

### Combining into a Single datetime Column

After normalization, concatenate the date and time components into an ISO-8601 formatted string, then perform a single bulk `to_datetime` call. This minimizes parser overhead by processing the entire dataset in one vectorized operation.

```python

# Combine into ISO-8601 string format

df["datetime_str"] = (
    df["date"].dt.strftime("%Y-%m-%d") + "T" +
    df["time"].astype(str)
)

# Single bulk parse

df["timestamp"] = pd.to_datetime(df["datetime_str"],
                                 format="%Y-%m-%dT%H:%M:%S",
                                 errors="coerce")
print(df[["timestamp"]])

```

## Managing Mixed Formats Within a Single Column

When a single column contains truly heterogeneous date formats that cannot be standardized through column-level normalization—such as mixing `2020-12-31`, `31/12/2020`, and `12/31/2020` in the same column—pandas 2.0+ provides the `format="mixed"` option. This triggers per-element format inference, though it falls back to Python parsing and is significantly slower than the vectorized fast path.

According to the implementation in [`pandas/core/tools/datetimes.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py), this mode iterates through elements to determine the appropriate parser for each row:

```python
import pandas as pd

s = pd.Series([
    "2020-12-31 23:45",
    "31/12/2020 11:45 PM",
    "12/31/20 23:45",
    "20201231T2345"
])

# Mixed format parsing (pandas 2.0+)

out = pd.to_datetime(s, format="mixed", dayfirst=True, errors="coerce")
print(out)

```

**Best practice**: Reserve `format="mixed"` for cleanup operations on small datasets or residual dirty data after exhausting column-level normalization strategies, as it dramatically reduces throughput compared to the C-level vectorized paths in `pandas/_libs/tslibs/parsing.pyx`.

## Performance Architecture and Source Code Implementation

Understanding the underlying architecture explains why these methods convert multiple columns to datetime in pandas efficiently. The `to_datetime` function serves as the primary entry point in [[`pandas/core/tools/datetimes.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py)](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py) (lines 887-904).

For DataFrame inputs containing component columns, the code dispatches to `_assemble_from_unit_mappings` (around line 893), which validates the presence of required fields and constructs `Timestamp` objects using vectorized operations in `pandas/_libs/tslibs/parsing.pyx`. This Cython implementation avoids Python loops entirely, achieving **O(N)** complexity with minimal memory overhead.

When parsing string columns with explicit format arguments, pandas compiles the format string into optimized parsing logic that bypasses inference overhead. This contrasts sharply with the `format="mixed"` mode, which must evaluate each element individually using Python's `datetime.strptime` as a fallback.

## Summary

- **Vectorized assembly**: Pass a DataFrame with `year`, `month`, `day` (and optional time) columns directly to `pd.to_datetime()` for O(N) performance via `_assemble_from_unit_mappings` in [`pandas/core/tools/datetimes.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py).
- **Heterogeneous formats**: Normalize columns with different date/time patterns individually using explicit `format` strings, concatenate into ISO-8601 format, and parse once to leverage the fast path in `pandas/_libs/tslibs/parsing.pyx`.
- **Mixed row formats**: Use `format="mixed"` (pandas 2.0+) only when necessary, as it falls back to Python-level parsing per element.
- **Avoid loops**: Never use `apply()` or `map()` for datetime conversion; rely on the C-level vectorized pathways to maintain throughput on millions of rows.

## Frequently Asked Questions

### How do I convert multiple columns to datetime in pandas without using apply?

Pass a DataFrame containing columns named `year`, `month`, `day`, and optionally `hour`, `minute`, `second`, or `microsecond` directly to `pd.to_datetime(df)`. According to the source code in [`pandas/core/tools/datetimes.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py), this triggers the `_assemble_from_unit_mappings` function, which performs vectorized C-level assembly without Python loops.

### What is the fastest way to handle different date formats in separate columns?

Normalize each column individually using explicit `format` parameters (e.g., `format='%d/%m/%Y'` for European dates), then combine them into a single ISO-8601 formatted string column before calling `pd.to_datetime()` once. This approach leverages the optimized parser in `pandas/_libs/tslibs/parsing.pyx` and avoids the expensive per-row inference required by `format='mixed'`.

### When should I use format='mixed' in pandas to_datetime?

Use `format='mixed'` (available in pandas 2.0+) only when a single column contains truly heterogeneous date formats that cannot be standardized through column-level preprocessing, such as mixing `2020-12-31`, `31/12/2020`, and `12/31/2020` in the same column. Be aware that this mode falls back to Python-level `datetime.strptime` parsing per element, making it significantly slower than vectorized approaches.

### How does pandas handle missing time components when assembling datetime from multiple columns?

When using `pd.to_datetime()` with a DataFrame containing `year`, `month`, and `day` columns but missing time components (hour, minute, second, microsecond), pandas defaults the missing values to midnight (00:00:00). This behavior is implemented in the `_assemble_from_unit_mappings` logic within [`pandas/core/tools/datetimes.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py), which validates required fields and fills absent time units with zeros during C-level Timestamp construction.

```