Text Encoding Conversion with iconv: A Practical Command-Line Guide

Use iconv -f [source] -t [target] to convert text encodings directly in the shell, with -c to skip invalid characters and //TRANSLIT for approximate replacements.

Text encoding conversion with iconv is a critical skill for data processing and file migration tasks. According to the jlevy/the-art-of-command-line repository, the iconv utility provides a portable, scriptable solution for translating files between character sets directly from the command line. The repository's master README.md contains the definitive examples at line 287, duplicated across multiple language translations.

Why iconv Matters for Text Encoding Conversion

POSIX-Standard Portability

iconv ships with virtually every Linux and macOS distribution as part of the POSIX standard. This eliminates dependencies on external libraries when performing text encoding conversion in automated scripts or containerized environments.

Explicit Encoding Control

The tool requires explicit declaration of source and target encodings using the -f (from) and -t (to) flags. This explicit approach prevents the ambiguous defaults that often plague automated text processing pipelines.

Robust Error Handling

When dealing with corrupted or mixed-encoding files, iconv provides two critical flags:

  • -c silently discards characters that cannot be converted
  • -s reports conversion errors without aborting the process

Practical Text Encoding Conversion Commands

Basic Syntax for File Conversion

Convert a UTF-8 encoded file to ISO-8859-1 (Latin-1) using the standard input/output redirection pattern found in README.md at line 287:

iconv -f UTF-8 -t ISO-8859-1 input.txt > output.txt

Discovering Available Character Sets

Before converting, verify that iconv supports your target encoding:

iconv -l | grep -i utf

Handling Invalid and Unconvertible Characters

For files containing corrupted bytes or incompatible symbols, use the -c flag to skip invalid characters or //TRANSLIT to create approximate ASCII representations:

iconv -f UTF-8 -t ASCII//TRANSLIT -c corrupted.txt > cleaned.txt

Advanced Unicode Normalization with uconv

When text encoding conversion requires Unicode-aware transformations beyond simple character set mapping, the repository recommends uconv from the ICU library. This tool supports complex operations like case folding and accent removal:

uconv -f utf-8 -t utf-8 -x '::Any-Lower; ::Any-NFD; [:Nonspacing Mark:] >; ::Any-NFC;' \
      < input.txt > normalized.txt

Pipeline Integration

Integrate iconv into processing pipelines to re-encode data streams on the fly:

xmlstarlet unesc | fmt -80 | iconv -t US-ASCII > clean.xml

Source Code Locations in the Repository

The text encoding conversion examples reside in the English master README.md at line 287. The jlevy/the-art-of-command-line repository maintains synchronized translations including README-zh.md (line 278) and README-fr.md (line 385), ensuring consistent documentation across languages.

Summary

  • Use iconv -f [encoding] -t [encoding] for explicit text encoding conversion between any supported character sets
  • Add -c to discard unconvertible characters or -s to report errors without stopping execution
  • Leverage //TRANSLIT suffixes for approximate character mappings when converting to restricted encodings like ASCII
  • Employ uconv for advanced Unicode operations including case folding and accent stripping
  • Reference line 287 of README.md in the repository for the canonical implementation examples

Frequently Asked Questions

How do I convert UTF-8 to ISO-8859-1 using iconv?

Execute iconv -f UTF-8 -t ISO-8859-1 input.txt > output.txt to translate a UTF-8 encoded file to Latin-1 format. The -f flag specifies the source encoding while -t defines the target character set, with output redirected to a new file.

What is the difference between iconv and uconv?

iconv performs direct character set translation between encodings, while uconv (from the ICU library) provides advanced Unicode text processing including normalization forms, case folding, and diacritic removal. Use iconv for straightforward text encoding conversion and uconv when you need linguistic transformations.

How can I list all available encodings in iconv?

Run iconv -l to display the complete list of supported character sets, then pipe to grep to filter for specific encodings like iconv -l | grep -i utf. This ensures your target encoding is available before attempting conversion.

How do I handle corrupted characters during text encoding conversion?

Add the -c flag to iconv to silently discard characters that cannot be converted, or use //TRANSLIT in the target encoding (e.g., ASCII//TRANSLIT) to substitute unconvertible characters with approximate representations. The -s flag suppresses error messages without skipping characters.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →