'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte

Question

import pandas as pd
df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923")
        print(df3)

I am trying to open a csv with Pandas but get a UnicodeDecodeError:

File ~anaconda3libsite-packagespandas_libsparsers.pyx:544, in pandas._libs.parsers.TextReader.__cinit__()

File ~anaconda3libsite-packagespandas_libsparsers.pyx:633, in pandas._libs.parsers.TextReader._get_header()

File ~anaconda3libsite-packagespandas_libsparsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~anaconda3libsite-packagespandas_libsparsers.pyx:1952, in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte

Asked By: Cory

||

Source

Answer 1

try:

df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923" , encoding='gb18030')

Answered By: khaled koubaa

Answer 2

This means that your CSV file is not valid UTF-8. Perhaps it uses a different encoding? If there is no information from the source you got it from as to what encoding it is, take a look at the answers to this question for several different ways of guessing the encoding. Once you know the encoding, you can specify it with the encoding parameter to read_csv.

You could also use the encoding_errors parameter to read_csv to specify an alternative action to take when Pandas encounters an encoding error. The default is to raise an error, but instead you can ignore those characters, or replace them with replacement characters. See this answer for details.

Answered By: Jack Taylor

'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte

Question:

Answers: