How to read sys.stdin containing binary data in python (ignore errors)?

Question:

How do I read sys.stdin, but ignoring decoding errors?
I know that sys.stdin.buffer exists, and I can read the binary data and then decode it with .decode('utf8', errors='ignore'), but I want to read sys.stdin line by line.
Maybe I can somehow reopen the sys.stdin file but with errors='ignore' option?

Asked By: g00dds

||

Answers:

You can set an errorhandler option on the PYTHONIOENCODING environment variable: this will affect both sys.stdin and sys,stdout (sys.stderr will always use "backslashreplace"). PYTHONIOENCODING accepts an optional encoding name and an optional errorhandler name preceded by a colon, so "UTF8", "UTF8:ignore" and ":ignore" would all be valid values.

$  cat so73335410.py
import sys

if __name__ == '__main__':
    data = sys.stdin.read()
    print(data)
$
$  echo hello | python so73335410.py
hello

$  echo hello hello hello hello | zip > hello.zip
  adding: - (deflated 54%)
$
$  cat hello.zip | PYTHONIOENCODING=UTF8:ignore python so73335410.py
UYv>
  -▒
UY  HW@'PKv>

  ▒-PK,-/>PKmPK/>
$ 
Answered By: snakecharmerb

Found three solutions from here as Mark Setchell mentioned.

import sys
import io

def first():
    with open(sys.stdin.fileno(), 'r', errors='ignore') as f:
        return f.read()

def second():
    sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore')
    return sys.stdin.read()

def third():
    sys.stdin.reconfigure(errors='ignore')
    return sys.stdin.read()


print(first())
#print(second())
#print(third())

Usage:

$ echo 'ax80b' | python solution.py
ab
Answered By: g00dds
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.