'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte

Question:

import pandas as pd
df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923")
        print(df3)

I am trying to open a csv with Pandas but get a UnicodeDecodeError:

File ~anaconda3libsite-packagespandas_libsparsers.pyx:544, in pandas._libs.parsers.TextReader.__cinit__()

File ~anaconda3libsite-packagespandas_libsparsers.pyx:633, in pandas._libs.parsers.TextReader._get_header()

File ~anaconda3libsite-packagespandas_libsparsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~anaconda3libsite-packagespandas_libsparsers.pyx:1952, in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte
Asked By: Cory

||

Answers:

try:

df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923" , encoding='gb18030')

Answered By: khaled koubaa

This means that your CSV file is not valid UTF-8. Perhaps it uses a different encoding? If there is no information from the source you got it from as to what encoding it is, take a look at the answers to this question for several different ways of guessing the encoding. Once you know the encoding, you can specify it with the encoding parameter to read_csv.

You could also use the encoding_errors parameter to read_csv to specify an alternative action to take when Pandas encounters an encoding error. The default is to raise an error, but instead you can ignore those characters, or replace them with replacement characters. See this answer for details.

Answered By: Jack Taylor
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.