Text Encoding Conversion with iconv: A Practical Command-Line Guide
Use iconv -f [source] -t [target] to convert text encodings directly in the shell, with -c to skip invalid characters and //TRANSLIT for approximate replacements.
Text encoding conversion with iconv is a critical skill for data processing and file migration tasks. According to the jlevy/the-art-of-command-line repository, the iconv utility provides a portable, scriptable solution for translating files between character sets directly from the command line. The repository's master README.md contains the definitive examples at line 287, duplicated across multiple language translations.
Why iconv Matters for Text Encoding Conversion
POSIX-Standard Portability
iconv ships with virtually every Linux and macOS distribution as part of the POSIX standard. This eliminates dependencies on external libraries when performing text encoding conversion in automated scripts or containerized environments.
Explicit Encoding Control
The tool requires explicit declaration of source and target encodings using the -f (from) and -t (to) flags. This explicit approach prevents the ambiguous defaults that often plague automated text processing pipelines.
Robust Error Handling
When dealing with corrupted or mixed-encoding files, iconv provides two critical flags:
-csilently discards characters that cannot be converted-sreports conversion errors without aborting the process
Practical Text Encoding Conversion Commands
Basic Syntax for File Conversion
Convert a UTF-8 encoded file to ISO-8859-1 (Latin-1) using the standard input/output redirection pattern found in README.md at line 287:
iconv -f UTF-8 -t ISO-8859-1 input.txt > output.txt
Discovering Available Character Sets
Before converting, verify that iconv supports your target encoding:
iconv -l | grep -i utf
Handling Invalid and Unconvertible Characters
For files containing corrupted bytes or incompatible symbols, use the -c flag to skip invalid characters or //TRANSLIT to create approximate ASCII representations:
iconv -f UTF-8 -t ASCII//TRANSLIT -c corrupted.txt > cleaned.txt
Advanced Unicode Normalization with uconv
When text encoding conversion requires Unicode-aware transformations beyond simple character set mapping, the repository recommends uconv from the ICU library. This tool supports complex operations like case folding and accent removal:
uconv -f utf-8 -t utf-8 -x '::Any-Lower; ::Any-NFD; [:Nonspacing Mark:] >; ::Any-NFC;' \
< input.txt > normalized.txt
Pipeline Integration
Integrate iconv into processing pipelines to re-encode data streams on the fly:
xmlstarlet unesc | fmt -80 | iconv -t US-ASCII > clean.xml
Source Code Locations in the Repository
The text encoding conversion examples reside in the English master README.md at line 287. The jlevy/the-art-of-command-line repository maintains synchronized translations including README-zh.md (line 278) and README-fr.md (line 385), ensuring consistent documentation across languages.
Summary
- Use
iconv -f [encoding] -t [encoding]for explicit text encoding conversion between any supported character sets - Add
-cto discard unconvertible characters or-sto report errors without stopping execution - Leverage
//TRANSLITsuffixes for approximate character mappings when converting to restricted encodings like ASCII - Employ
uconvfor advanced Unicode operations including case folding and accent stripping - Reference line 287 of
README.mdin the repository for the canonical implementation examples
Frequently Asked Questions
How do I convert UTF-8 to ISO-8859-1 using iconv?
Execute iconv -f UTF-8 -t ISO-8859-1 input.txt > output.txt to translate a UTF-8 encoded file to Latin-1 format. The -f flag specifies the source encoding while -t defines the target character set, with output redirected to a new file.
What is the difference between iconv and uconv?
iconv performs direct character set translation between encodings, while uconv (from the ICU library) provides advanced Unicode text processing including normalization forms, case folding, and diacritic removal. Use iconv for straightforward text encoding conversion and uconv when you need linguistic transformations.
How can I list all available encodings in iconv?
Run iconv -l to display the complete list of supported character sets, then pipe to grep to filter for specific encodings like iconv -l | grep -i utf. This ensures your target encoding is available before attempting conversion.
How do I handle corrupted characters during text encoding conversion?
Add the -c flag to iconv to silently discard characters that cannot be converted, or use //TRANSLIT in the target encoding (e.g., ASCII//TRANSLIT) to substitute unconvertible characters with approximate representations. The -s flag suppresses error messages without skipping characters.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →