'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte
Question:
import pandas as pd
df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923")
print(df3)
I am trying to open a csv with Pandas but get a UnicodeDecodeError:
File ~anaconda3libsite-packagespandas_libsparsers.pyx:544, in pandas._libs.parsers.TextReader.__cinit__()
File ~anaconda3libsite-packagespandas_libsparsers.pyx:633, in pandas._libs.parsers.TextReader._get_header()
File ~anaconda3libsite-packagespandas_libsparsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()
File ~anaconda3libsite-packagespandas_libsparsers.pyx:1952, in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte
Answers:
try:
df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923" , encoding='gb18030')
This means that your CSV file is not valid UTF-8. Perhaps it uses a different encoding? If there is no information from the source you got it from as to what encoding it is, take a look at the answers to this question for several different ways of guessing the encoding. Once you know the encoding, you can specify it with the encoding
parameter to read_csv.
You could also use the encoding_errors
parameter to read_csv to specify an alternative action to take when Pandas encounters an encoding error. The default is to raise an error, but instead you can ignore those characters, or replace them with replacement characters. See this answer for details.
import pandas as pd
df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923")
print(df3)
I am trying to open a csv with Pandas but get a UnicodeDecodeError:
File ~anaconda3libsite-packagespandas_libsparsers.pyx:544, in pandas._libs.parsers.TextReader.__cinit__()
File ~anaconda3libsite-packagespandas_libsparsers.pyx:633, in pandas._libs.parsers.TextReader._get_header()
File ~anaconda3libsite-packagespandas_libsparsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()
File ~anaconda3libsite-packagespandas_libsparsers.pyx:1952, in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 4: invalid start byte
try:
df3 = pd.read_csv("https://www.twse.com.tw/exchangeReport/MI_INDEX20?response=csv&date=20220923" , encoding='gb18030')
This means that your CSV file is not valid UTF-8. Perhaps it uses a different encoding? If there is no information from the source you got it from as to what encoding it is, take a look at the answers to this question for several different ways of guessing the encoding. Once you know the encoding, you can specify it with the encoding
parameter to read_csv.
You could also use the encoding_errors
parameter to read_csv to specify an alternative action to take when Pandas encounters an encoding error. The default is to raise an error, but instead you can ignore those characters, or replace them with replacement characters. See this answer for details.