How to Make Python 3 print() Output UTF-8 to Stdout: Fix Encoding Issues and Raw Bytes Problems — pythontutorials.net
Table of Contents#
Understanding Stdout Encoding in Python 3
Common Encoding Issues and Error Messages
Solutions to Force UTF-8 Output in print()
3.1 Use the PYTHONUTF8 Environment Variable (Python 3.7+)
3.2 Reconfigure sys.stdout at Runtime (Python 3.7+)
3.3 Replace sys.stdout with io.TextIOWrapper (Python 3.0+)
Handling Raw Bytes: From b'...' to Readable Text
Platform-Specific Tips
5.1 Windows
5.2 Linux/macOS
Best Practices to Avoid Encoding Headaches
Conclusion
References
1. Understanding Stdout Encoding in Python 3#
Before diving into fixes, let’s clarify what "stdout encoding" means. Standard output (stdout) is the default stream where Python sends text from print() statements (e.g., your terminal, a file, or another program). For text to display correctly, Python must encode the string data into bytes using an encoding that the receiving end (e.g., your terminal) understands.
How Python Determines Stdout Encoding#
Python 3 uses the following logic to set sys.stdout.encoding:
If the output is a terminal, Python queries the terminal’s encoding (via the system’s locale settings).
If the output is redirected to a file or pipe, Python uses the encoding specified by the LC_CTYPE environment variable (or falls back to ASCII if unset).
The problem? Many terminals (especially on older Windows systems or misconfigured Linux environments) default to encodings like cp1252 (Windows-1252), cp850 (DOS Latin-1), or ISO-8859-1 instead of UTF-8. Since UTF-8 supports all Unicode characters, using a non-UTF-8 encoding will fail when printing characters outside its range (e.g., Chinese, Arabic, or emojis).
2. Common Encoding Issues and Error Messages#
If Python tries to encode a string with characters unsupported by the stdout encoding, you’ll hit a UnicodeEncodeError. Here’s an example:
# Trying to print a French character with a non-UTF-8 stdout encoding<br>print("Café au lait") # "é" is Unicode U+00E9<br>Error Message:
UnicodeEncodeError: 'charmap' codec can't encode character '\xe9' in position 3: character maps to
Another common issue is raw bytes output . If you print a bytes object directly, Python will display it with a b'...' prefix instead of decoding it into readable text:
print(b"Café au lait") # Output: b'Caf\xc3\xa9 au lait' (not "Café au lait")<br>3. Solutions to Force UTF-8 Output in print()#
Let’s fix these issues by ensuring Python uses UTF-8 for stdout encoding.
3.1 Use the PYTHONUTF8 Environment Variable (Python 3.7+)#
The simplest solution (for Python 3.7 and later) is to set the PYTHONUTF8 environment variable to 1. This forces Python to use UTF-8 for all standard streams (stdout, stdin, stderr), regardless of the system’s locale or terminal settings.
How to Set It:#
Temporarily (per shell session):
In Linux/macOS bash/zsh:
export PYTHONUTF8=1<br>python3 your_script.py<br>In Windows Command Prompt:
set PYTHONUTF8=1<br>python your_script.py<br>In Windows PowerShell:
$env:PYTHONUTF8=1<br>python your_script.py
Permanently:
Add the export/set command to your shell profile (e.g., .bashrc, .zshrc on Linux/macOS) or set it in your system’s environment variables (Windows: System Properties → Advanced → Environment Variables).
Why It Works:#
PYTHONUTF8=1 overrides Python’s default encoding detection and enforces UTF-8 for all standard I/O streams. This is the recommended fix for most users.
3.2 Reconfigure sys.stdout at Runtime (Python 3.7+)#
If you can’t set environment variables (e.g., in restricted environments), reconfigure sys.stdout directly in your script using sys.stdout.reconfigure() (available in Python 3.7+).
Example Code:#
import sys
# Reconfigure stdout to use UTF-8<br>sys.stdout.reconfigure(encoding='utf-8')
# Now print non-ASCII characters<br>print("Café au lait 🍵") # Works!<br>Key Parameters:#
encoding='utf-8': Enforce UTF-8 encoding.
errors='replace': Optional. If a character can’t be encoded (unlikely with UTF-8), replace it with � instead of raising an error. Use errors='strict' (default) to crash on errors (useful for debugging).
3.3 Replace sys.stdout with io.TextIOWrapper (Python 3.0+)#
For Python versions before 3.7 , use io.TextIOWrapper to wrap the underlying binary stdout stream with UTF-8 encoding.
Example Code:#
import sys<br>import io
# Replace stdout with a UTF-8 encoded wrapper<br>sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
# Test with non-ASCII text<br>print("北京欢迎你 (Welcome to Beijing)") # Works!<br>Explanation:#
sys.stdout.buffer: The raw binary stream underlying stdout.
io.TextIOWrapper(buffer, encoding='utf-8'): Converts the binary stream into a text stream with UTF-8 encoding.
Note: This replaces sys.stdout entirely, so ensure you do this early in your script (before any print() calls).
4. Handling Raw Bytes: From b'...' to Readable Text#
A common pitfall is printing bytes objects directly, which results in b'...'...