# How to Fix Garbled Characters in Pandas DataFrame to CSV with UTF-8 Encoding

> Fix garbled characters in pandas DataFrame to CSV exports. Learn why UTF-8 encoding fails and discover the real solution for displaying international characters correctly.

- Repository: [pandas/pandas](https://github.com/pandas-dev/pandas)
- Tags: how-to-guide
- Published: 2026-02-20

---

**Even when specifying `encoding="utf-8"` in `DataFrame.to_csv()`, international characters appear as trash because pandas delegates file handling to `get_handle` in [`pandas/io/common.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/common.py), which ignores encoding parameters when operating in binary mode or when the file is later read with a different encoding.**

When exporting dataframes containing international text using `pandas.DataFrame.to_csv()`, developers expect UTF-8 encoding to preserve all characters correctly. However, the pandas source code reveals that the actual Unicode handling occurs deep in the IO stack, where certain parameter combinations can silently bypass your encoding specification, resulting in mojibake or replacement characters.

## How Pandas Handles Encoding in `to_csv`

`DataFrame.to_csv` does not directly write to files. Instead, it delegates CSV creation to **`pandas.io.formats.csvs.CSVFormatter`**, which manages column conversion and row formatting. The critical encoding logic resides in the `save` method, where `CSVFormatter` calls **`pandas.io.common.get_handle`** to open the target file:

```python
with get_handle(
    self.filepath_or_buffer,
    self.mode,
    encoding=self.encoding,
    errors=self.errors,
    compression=self.compression,
    storage_options=self.storage_options,
) as handles:
    # Writing occurs here

```

*Source:* [`CSVFormatter.__init__`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/formats/csvs.py) stores the encoding parameter, and [`CSVFormatter.save`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/formats/csvs.py) passes it to `get_handle`.

Because `CSVFormatter` only converts data to strings via `_get_values_for_csv` before passing rows to Python's standard `csv.writer`, it never modifies the characters themselves. All encoding enforcement happens inside `get_handle` when opening the file stream. If this handle opens in **binary mode** or receives an unrecognized encoding alias, the resulting file will contain incorrectly encoded bytes regardless of the `encoding` parameter value.

## Common Causes of UTF-8 Encoding Errors

### Binary Mode Discards Encoding Information

When you pass `mode='wb'` (write binary) to `to_csv`, `get_handle` opens a binary stream. Binary streams ignore the `encoding` parameter entirely, writing raw bytes that your operating system may interpret using its default code page (such as Windows-1252 or GBK) rather than UTF-8. This produces garbled output for characters like **é**, **ü**, or **中文**.

**Fix:** Use text mode (`mode='w'`) or allow pandas to choose the default mode automatically. When using compression, maintain `encoding="utf-8"` and ensure your reading application explicitly uses UTF-8.

### Incorrect Encoding Spelling

Pandas validates the encoding string but passes it directly to Python's `open` function. A typo like `"UTF8"` or `"utf8"` (without the hyphen) may fall back to the platform default on certain systems, while `"utf-8"` (lower-case with hyphen) is the canonical name recognized across all platforms.

**Fix:** Always use the exact spelling `"utf-8"` or the alias `"utf_8"`.

### The `errors` Parameter Silently Corrupts Data

Setting `errors="ignore"` or `errors="replace"` tells Python to drop unencodable bytes or substitute them with replacement characters () rather than raising an exception. This can make it appear that the export succeeded when characters were actually lost or mutated.

**Fix:** Keep `errors="strict"` (the default) during debugging to surface encoding mismatches immediately.

### System Locale and Console Misconfiguration

Even a correctly encoded UTF-8 file may display as garbage in terminals or editors configured for different code pages. This is a display issue, not an encoding issue, but it leads developers to believe the export failed.

**Fix:** Configure your terminal or text editor to use UTF-8, or explicitly open the file with `open(path, encoding="utf-8")` to verify contents.

### Reading Without Specifying Encoding

`pd.read_csv` defaults to the system locale encoding. Reading a UTF-8 file on a non-UTF-8 locale (common on Windows) produces mojibake even though the file itself is correctly encoded.

**Fix:** Always specify `encoding="utf-8"` when reading CSV files: `pd.read_csv(path, encoding="utf-8")`.

## Solutions and Code Examples

Export a dataframe with international characters correctly:

```python
import pandas as pd

df = pd.DataFrame({
    "city": ["München", "São Paulo", "北京"],
    "value": [1, 2, 3],
})

# Correct: Explicit UTF-8 text mode

df.to_csv("data_utf8.csv", index=False, encoding="utf-8")

# Correct: UTF-8 with compression

df.to_csv("data_utf8.zip", index=False, compression="zip", encoding="utf-8")

```

Read the file back safely:

```python

# Always match the encoding when reading

df2 = pd.read_csv("data_utf8.csv", encoding="utf-8")
print(df2)

```

Avoid the binary mode pitfall:

```python

# Wrong: Binary mode ignores encoding, produces garbled output

df.to_csv("bad.csv", mode="wb", encoding="utf-8")

# Correct: Text mode respects encoding

df.to_csv("good.csv", mode="w", encoding="utf-8")

```

## Key Source Files

Understanding these files clarifies why encoding issues occur:

- **[`pandas/io/formats/csvs.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/formats/csvs.py)**: Implements `CSVFormatter`, which handles column conversion and delegates file operations to `get_handle`. The `__init__` method stores encoding, and `save` applies it.
- **[`pandas/io/common.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/common.py)**: Provides `get_handle`, the centralized utility that opens file handles with the specified `encoding`, `mode`, and `compression`. This is where binary mode overrides encoding settings.
- **[`pandas/tests/io/formats/test_to_csv.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/tests/io/formats/test_to_csv.py)**: Contains test cases for encoding parameters and error handling during CSV export.
- **[`pandas/tests/io/parser/test_encoding.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/tests/io/parser/test_encoding.py)**: Validates round-trip read/write operations with various encodings, demonstrating the necessity of matching `encoding` parameters on both export and import.

## Summary

- **`DataFrame.to_csv`** delegates encoding to `get_handle` in [`pandas/io/common.py`](https://github.com/pandas-dev/pandas/blob/main/pandas/io/common.py), which only respects text modes.
- **Binary mode (`mode='wb'`)** forces `get_handle` to ignore the `encoding` parameter, causing the OS default code page to interpret bytes.
- **Always use `encoding="utf-8"`** (lower-case with hyphen) when writing and reading CSV files to ensure cross-platform compatibility.
- **Keep `errors="strict"`** during development to catch encoding mismatches before they silently corrupt data.
- **Specify encoding on both sides**: Export with `to_csv(encoding="utf-8")` and import with `read_csv(encoding="utf-8")` to prevent locale-based misinterpretation.

## Frequently Asked Questions

### Why does my CSV look correct in Python but shows garbage in Excel?

Excel uses your system's default code page to open CSV files unless you use the import data wizard. Save the file with a UTF-8 BOM (Byte Order Mark) by specifying `encoding="utf-8-sig"` in `to_csv`, or import the file through Excel's Data > From Text/CSV menu and manually select UTF-8 encoding.

### Does compression affect UTF-8 encoding in pandas?

No, compression algorithms handle bytes, not characters. However, when using `compression="zip"` or similar, ensure you still specify `encoding="utf-8"` so `get_handle` opens the underlying file in text mode with the correct codec before compression occurs.

### What is the difference between `utf-8` and `utf-8-sig` in pandas?

`utf-8` writes the raw UTF-8 byte sequence, while `utf-8-sig` prepends a BOM (Byte Order Mark) to the file. The BOM helps some applications (like Excel) recognize the file as UTF-8, but it can interfere with Unix tools that expect plain text. Use `utf-8-sig` only when targeting applications that require BOM detection.

### Why do I get a `UnicodeEncodeError` even with `encoding="utf-8"`?

This occurs when your dataframe contains characters that cannot be encoded in the specified encoding (for example, emoji in an ASCII file), or when `errors="strict"` encounters invalid surrogate pairs. Verify your data contains only valid Unicode code points, or use `errors="replace"` only as a last resort for lossy export.