how-to-guide

How to Fix Garbled Characters in Pandas DataFrame to CSV with UTF-8 Encoding

February 20, 2026 pandas-dev/pandas ↗

Even when specifying encoding="utf-8" in DataFrame.to_csv(), international characters appear as trash because pandas delegates file handling to get_handle in pandas/io/common.py, which ignores encoding parameters when operating in binary mode or when the file is later read with a different encoding.

When exporting dataframes containing international text using pandas.DataFrame.to_csv(), developers expect UTF-8 encoding to preserve all characters correctly. However, the pandas source code reveals that the actual Unicode handling occurs deep in the IO stack, where certain parameter combinations can silently bypass your encoding specification, resulting in mojibake or replacement characters.

How Pandas Handles Encoding in `to_csv`

DataFrame.to_csv does not directly write to files. Instead, it delegates CSV creation to pandas.io.formats.csvs.CSVFormatter, which manages column conversion and row formatting. The critical encoding logic resides in the save method, where CSVFormatter calls pandas.io.common.get_handle to open the target file:

with get_handle(
    self.filepath_or_buffer,
    self.mode,
    encoding=self.encoding,
    errors=self.errors,
    compression=self.compression,
    storage_options=self.storage_options,
) as handles:
    # Writing occurs here

Source: CSVFormatter.__init__ stores the encoding parameter, and CSVFormatter.save passes it to get_handle.

Because CSVFormatter only converts data to strings via _get_values_for_csv before passing rows to Python's standard csv.writer, it never modifies the characters themselves. All encoding enforcement happens inside get_handle when opening the file stream. If this handle opens in binary mode or receives an unrecognized encoding alias, the resulting file will contain incorrectly encoded bytes regardless of the encoding parameter value.

Common Causes of UTF-8 Encoding Errors

Binary Mode Discards Encoding Information

When you pass mode='wb' (write binary) to to_csv, get_handle opens a binary stream. Binary streams ignore the encoding parameter entirely, writing raw bytes that your operating system may interpret using its default code page (such as Windows-1252 or GBK) rather than UTF-8. This produces garbled output for characters like é, ü, or 中文.

Fix: Use text mode (mode='w') or allow pandas to choose the default mode automatically. When using compression, maintain encoding="utf-8" and ensure your reading application explicitly uses UTF-8.

Incorrect Encoding Spelling

Pandas validates the encoding string but passes it directly to Python's open function. A typo like "UTF8" or "utf8" (without the hyphen) may fall back to the platform default on certain systems, while "utf-8" (lower-case with hyphen) is the canonical name recognized across all platforms.

Fix: Always use the exact spelling "utf-8" or the alias "utf_8".

The `errors` Parameter Silently Corrupts Data

Setting errors="ignore" or errors="replace" tells Python to drop unencodable bytes or substitute them with replacement characters () rather than raising an exception. This can make it appear that the export succeeded when characters were actually lost or mutated.

Fix: Keep errors="strict" (the default) during debugging to surface encoding mismatches immediately.

System Locale and Console Misconfiguration

Even a correctly encoded UTF-8 file may display as garbage in terminals or editors configured for different code pages. This is a display issue, not an encoding issue, but it leads developers to believe the export failed.

Fix: Configure your terminal or text editor to use UTF-8, or explicitly open the file with open(path, encoding="utf-8") to verify contents.

Reading Without Specifying Encoding

pd.read_csv defaults to the system locale encoding. Reading a UTF-8 file on a non-UTF-8 locale (common on Windows) produces mojibake even though the file itself is correctly encoded.

Fix: Always specify encoding="utf-8" when reading CSV files: pd.read_csv(path, encoding="utf-8").

Solutions and Code Examples

Export a dataframe with international characters correctly:

import pandas as pd

df = pd.DataFrame({
    "city": ["München", "São Paulo", "北京"],
    "value": [1, 2, 3],
})

# Correct: Explicit UTF-8 text mode

df.to_csv("data_utf8.csv", index=False, encoding="utf-8")

# Correct: UTF-8 with compression

df.to_csv("data_utf8.zip", index=False, compression="zip", encoding="utf-8")

Read the file back safely:


# Always match the encoding when reading

df2 = pd.read_csv("data_utf8.csv", encoding="utf-8")
print(df2)

Avoid the binary mode pitfall:


# Wrong: Binary mode ignores encoding, produces garbled output

df.to_csv("bad.csv", mode="wb", encoding="utf-8")

# Correct: Text mode respects encoding

df.to_csv("good.csv", mode="w", encoding="utf-8")

Key Source Files

Understanding these files clarifies why encoding issues occur:

pandas/io/formats/csvs.py: Implements CSVFormatter, which handles column conversion and delegates file operations to get_handle. The __init__ method stores encoding, and save applies it.
pandas/io/common.py: Provides get_handle, the centralized utility that opens file handles with the specified encoding, mode, and compression. This is where binary mode overrides encoding settings.
pandas/tests/io/formats/test_to_csv.py: Contains test cases for encoding parameters and error handling during CSV export.
pandas/tests/io/parser/test_encoding.py: Validates round-trip read/write operations with various encodings, demonstrating the necessity of matching encoding parameters on both export and import.

Summary

DataFrame.to_csv delegates encoding to get_handle in pandas/io/common.py, which only respects text modes.
Binary mode (mode='wb') forces get_handle to ignore the encoding parameter, causing the OS default code page to interpret bytes.
Always use encoding="utf-8" (lower-case with hyphen) when writing and reading CSV files to ensure cross-platform compatibility.
Keep errors="strict" during development to catch encoding mismatches before they silently corrupt data.
Specify encoding on both sides: Export with to_csv(encoding="utf-8") and import with read_csv(encoding="utf-8") to prevent locale-based misinterpretation.

Frequently Asked Questions

Why does my CSV look correct in Python but shows garbage in Excel?

Excel uses your system's default code page to open CSV files unless you use the import data wizard. Save the file with a UTF-8 BOM (Byte Order Mark) by specifying encoding="utf-8-sig" in to_csv, or import the file through Excel's Data > From Text/CSV menu and manually select UTF-8 encoding.

Does compression affect UTF-8 encoding in pandas?

No, compression algorithms handle bytes, not characters. However, when using compression="zip" or similar, ensure you still specify encoding="utf-8" so get_handle opens the underlying file in text mode with the correct codec before compression occurs.

What is the difference between `utf-8` and `utf-8-sig` in pandas?

utf-8 writes the raw UTF-8 byte sequence, while utf-8-sig prepends a BOM (Byte Order Mark) to the file. The BOM helps some applications (like Excel) recognize the file as UTF-8, but it can interfere with Unix tools that expect plain text. Use utf-8-sig only when targeting applications that require BOM detection.

Why do I get a `UnicodeEncodeError` even with `encoding="utf-8"`?

This occurs when your dataframe contains characters that cannot be encoded in the specified encoding (for example, emoji in an ASCII file), or when errors="strict" encounters invalid surrogate pairs. Verify your data contains only valid Unicode code points, or use errors="replace" only as a last resort for lossy export.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how pandas-dev/pandas works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →