How to Use Pandas to_sql to Write a DataFrame to a Database: A Complete Guide

Use DataFrame.to_sql(name, con, if_exists='fail', index=True) to write pandas DataFrame data to SQL databases via SQLAlchemy engines or DB-API connections, with automatic schema inference and configurable insert strategies.

The to_sql method in pandas provides a high-level interface for persisting DataFrame data to relational databases. According to the pandas-dev/pandas source code, this functionality is implemented in pandas/io/sql.py and exposed through the DataFrame class in pandas/core/frame.py, supporting backends including SQLite, PostgreSQL, MySQL, and Oracle through SQLAlchemy or raw DB-API connections.

How DataFrame.to_sql Works Under the Hood

Understanding the internal architecture helps optimize database writes and troubleshoot connection issues.

Connection Handling and the SQLDatabase Wrapper

When you pass a connection object to to_sql, pandas wraps it using the SQLDatabase class defined in pandas/io/sql.py. If you provide a DB-API connection directly, pandas internally adapts it to provide a consistent interface. When SQLAlchemy is available, the method leverages its engine abstraction to handle dialect-specific SQL generation and connection pooling.

Schema Inference and Dtype Mapping

The method automatically infers SQL column types from pandas dtypes through the _convert_dtypes function in pandas/io/sql.py. Standard mappings include int64 → BIGINT, float64 → DOUBLE PRECISION, and object → TEXT. You can override these defaults using the dtype parameter to specify exact SQL types such as DECIMAL(10,2) or VARCHAR(255).

Insert Strategies with if_exists

The if_exists parameter controls table creation behavior:

  • if_exists='fail' (default): Raises a ValueError if the target table already exists.
  • if_exists='replace': Drops the existing table and creates a new schema based on the DataFrame structure.
  • if_exists='append': Inserts rows into the existing table without modifying the schema.

Bulk Loading and Performance Optimization

For large datasets, to_sql supports chunked inserts via the chunksize parameter, which breaks the operation into smaller transactions to manage memory usage. Additionally, the method parameter accepts a callable that receives a database cursor and the data to insert. When the underlying DB-API driver supports executemany, pandas performs efficient bulk inserts rather than row-by-row operations.

Index and Transaction Management

By default, pandas writes the DataFrame index as a column named index. Set index=False to omit it, or use index_label to specify a custom column name. When using SQLAlchemy engines, the entire write operation executes within a single transaction that commits upon success or rolls back on failure.

Practical Code Examples

Writing to SQLite with SQLAlchemy Engine

Connect to an SQLite database and write a new table, replacing any existing data:

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("sqlite:///example.db")
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age":  [25, 30, 35],
    "salary": [70000.0, 80000.0, 90000.0]
})

df.to_sql(name="employees", con=engine, if_exists="replace", index=False)

Appending Data to PostgreSQL Tables

Add new rows to an existing PostgreSQL table without dropping the current schema:

engine = create_engine(
    "postgresql+psycopg2://user:password@localhost:5432/mydb"
)

new_rows = pd.DataFrame({
    "name": ["David", "Eva"],
    "age":  [28, 32],
    "salary": [75000.0, 82000.0]
})

new_rows.to_sql(name="employees", con=engine, if_exists="append", index=False)

Custom SQL Type Mapping for MySQL

Force specific SQL column types when creating tables in MySQL:

engine = create_engine("mysql+pymysql://user:pwd@localhost/test")
df = pd.DataFrame({
    "product_id": [1, 2, 3],
    "description": ["A", "B", "C"],
    "price": [9.99, 19.99, 29.99]
})

dtype_map = {"price": "DECIMAL(10,2)"}
df.to_sql(
    name="catalog",
    con=engine,
    if_exists="replace",
    index=False,
    dtype=dtype_map
)

Optimized Bulk Inserts with Custom Methods

Implement a custom insertion method for maximum control over the bulk loading process:

def bulk_insert(cursor, df, table, **kw):
    cols = ",".join(df.columns)
    placeholders = ",".join(["?"] * len(df.columns))
    sql = f"INSERT INTO {table} ({cols}) VALUES ({placeholders})"
    cursor.executemany(sql, df.itertuples(index=False, name=None))

engine = create_engine("sqlite:///bulk.db")
df_big = pd.DataFrame(
    {"col1": range(10000), "col2": ["x"] * 10000}
)

df_big.to_sql(
    name="big_table",
    con=engine,
    if_exists="replace",
    index=False,
    method=bulk_insert,
    chunksize=2000
)

Summary

  • pandas/io/sql.py contains the core implementation of to_sql, including the SQLDatabase wrapper and dtype conversion logic.
  • pandas/core/frame.py exposes to_sql as a public DataFrame method.
  • Connection flexibility: Works with both SQLAlchemy engines and raw DB-API connections.
  • Schema control: Automatic type mapping can be overridden with the dtype parameter.
  • Data safety: Use if_exists='append' to add data without destroying existing tables, or 'replace' for fresh schema creation.
  • Performance tuning: Leverage chunksize for memory management and custom method callables for driver-specific optimizations like executemany.

Frequently Asked Questions

What database backends are supported by pandas to_sql?

The method supports any database accessible via SQLAlchemy or DB-API 2.0, including SQLite, PostgreSQL, MySQL, Microsoft SQL Server, Oracle, and cloud variants like Amazon Redshift or Google BigQuery through appropriate dialect drivers.

How do I prevent pandas from writing the DataFrame index to SQL?

Set index=False in the to_sql call. If you want to preserve the index but rename the column, use index_label='custom_name' instead of the default 'index'.

What is the difference between if_exists='replace' and 'append'?

The 'replace' option drops the existing table entirely and recreates it based on the current DataFrame schema, which destroys existing data and constraints. The 'append' option inserts rows into the existing table structure without modifying the schema, preserving existing data and indexes.

How can I improve performance when writing large DataFrames to SQL?

Use the chunksize parameter to process data in batches (e.g., chunksize=10000), and consider passing a custom method callable that utilizes your driver's executemany capability. For massive datasets, database-specific bulk loading tools like PostgreSQL's COPY or MySQL's LOAD DATA INFILE may outperform to_sql.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →