How to Convert Mixed Data Types to Datetime in Pandas: `pd.to_datetime` Guide

Use pd.to_datetime() with errors='coerce' to convert columns containing mixed strings, integers, floats, and datetime objects into a unified datetime format, replacing unparsable values with NaT.

The pandas-dev/pandas library provides a robust solution for pandas to datetime conversion with mixed data types through the pd.to_datetime() function. Whether your column contains ISO strings, Unix timestamps, native Python datetime objects, or missing values, this utility normalizes heterogeneous inputs into a consistent datetime64[ns] dtype. This guide examines the internal implementation and practical patterns for handling mixed-type columns.

How pd.to_datetime Handles Mixed Types Internally

When you pass a Series or array with heterogeneous data to pd.to_datetime(), the function delegates to _convert_listlike_datetimes in [pandas/core/tools/datetimes.py](https://github.com/pandas-dev/pandas/blob/main/pandas/core/tools/datetimes.py#L688). This internal helper iterates through elements normalizing each value according to its detected type.

The conversion pipeline follows this sequence:

  1. Type Detection: Identifies whether each element is a string, numeric timestamp, datetime object, or np.datetime64
  2. Vectorized Parsing: Attempts to apply inferred formats across compatible elements
  3. Error Handling: Routes parsing failures according to the errors parameter strategy

Because the operation works element-wise internally, you can safely process columns combining Unix epochs, ISO-8601 strings, and Python datetime instances in a single call.

Practical Code Examples

Basic Mixed-Type Conversion with Error Coercion

The most straightforward approach for messy data uses errors='coerce' to force unparseable entries to NaT (Not a Time):

import pandas as pd
import numpy as np
from datetime import datetime

df = pd.DataFrame({
    "mixed_date": [
        "2023-01-15",          # ISO string

        1673779200,            # Unix timestamp (seconds)

        datetime(2023, 2, 1),  # Native Python datetime

        np.nan,                # Missing value

        "invalid date"         # Unparsable string

    ]
})

# Convert mixed types, coercing failures to NaT

df["parsed"] = pd.to_datetime(df["mixed_date"], errors="coerce")
print(df)

Output:


          mixed_date             parsed
0       2023-01-15   2023-01-15 00:00:00
1       1673779200   2023-01-15 00:00:00
2 2023-02-01 00:00:00 2023-02-01 00:00:00
3              NaN                  NaT
4     invalid date                  NaT

Handling Mixed Timezone-Aware and Naive Datetimes

When your data mixes timezone-aware strings with naive datetime objects, use utc=True to normalize everything to UTC:

data = [
    "2023-03-10 12:00+02:00",   # Aware with +02:00 offset

    "2023-03-10 10:00+00:00",   # Aware with UTC offset

    datetime(2023, 3, 10, 9)    # Naive datetime

]

s = pd.Series(data)
s_utc = pd.to_datetime(s, utc=True, errors="coerce")
print(s_utc)

Output:


0   2023-03-10 10:00:00+00:00
1   2023-03-10 10:00:00+00:00
2   2023-03-10 09:00:00+00:00
dtype: datetime64[ns, UTC]

Optimizing Performance with Custom Formats

If your string data follows a consistent pattern, specify the format parameter to bypass inference logic and accelerate parsing:

df = pd.DataFrame({
    "date_str": ["15/01/2023", "20/02/2023", "invalid"]
})

# Force day-first format; bad rows become NaT

df["date"] = pd.to_datetime(
    df["date_str"], 
    format="%d/%m/%Y", 
    errors="coerce"
)
print(df)

Output:


     date_str       date
0  15/01/2023 2023-01-15
1  20/02/2023 2023-02-20
2     invalid        NaT

Critical Parameters for Mixed-Type Columns

Parameter Function Best Practice
errors Controls behavior on parse failures ('raise', 'coerce', 'ignore') Use 'coerce' for mixed data to avoid exceptions on unparsable values
format Specifies strftime pattern (e.g., "%Y-%m-%d") Apply when string formats are uniform to improve speed
utc Localizes naive timestamps to UTC and converts aware timestamps Set True when mixing timezone-aware and naive data
unit Defines epoch unit for numeric values ('s', 'ms', 'us', 'ns') Use 's' for Unix seconds or 'ms' for JavaScript timestamps
dayfirst Resolves ambiguous dates (e.g., 10/11/12 as day/month/year) Enable for European date formats

Summary

  • pd.to_datetime() accepts heterogeneous mixes of strings, integers, floats, datetime objects, and np.datetime64 in a single column
  • The implementation relies on _convert_listlike_datetimes in pandas/core/tools/datetimes.py to normalize elements individually
  • errors='coerce' provides the most straightforward error handling for mixed data, converting failures to NaT
  • Use utc=True to standardize timezone-aware and naive datetime mixtures to UTC
  • Specify format strings when possible to optimize parsing performance on large datasets

Frequently Asked Questions

Can pd.to_datetime handle a column with both string dates and Unix timestamps?

Yes. According to the pandas source code in pandas/core/tools/datetimes.py, the function detects types individually. Pass numeric epochs as integers or floats alongside strings, and use unit='s' if the numbers represent seconds since epoch. The function normalizes each element through the internal conversion pipeline.

What happens when pd.to_datetime encounters an unparseable value?

By default (errors='raise'), it throws a ParserError. However, for mixed data containing potential garbage values, set errors='coerce' to replace unparseable entries with NaT (Not a Time). This ensures the operation completes successfully while flagging problematic rows as null values.

Is there a performance difference between parsing mixed types versus uniform strings?

Yes. Mixed-type columns require element-wise inspection via _convert_listlike_datetimes, which is slower than vectorized string parsing. If your strings follow a consistent format, specifying the format parameter bypasses inference logic. Additionally, cache=True (default) improves performance on large datasets with duplicate values by caching conversion results.

How do I convert mixed timezone-aware and naive datetime objects consistently?

Use utc=True in your pd.to_datetime() call. This parameter localizes naive timestamps as UTC and converts timezone-aware timestamps to UTC, resulting in a uniform datetime64[ns, UTC] dtype. Without this flag, mixing aware and naive types typically raises a TypeError or produces inconsistent results.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →