Can not write Japanese characters by pandas in Python
Question:
I’m trying to write data with Japanese characters to file CSV.
But CSV’s not correct Japanese characters
def write_csv(columns, data):
df = pd.DataFrame(data, columns=columns)
df.to_csv("..ReportReport.csv", encoding='utf-8')
write_csv(["法人番号", "法人名称", "法人名称カナ"], [])
and CSV:
æ³•äººç•ªå· æ³•äººå称 法人å称カナ
How can I accomplish this?
Answers:
Your code is OK, just tried it. I’m guessing the CSV file is good but you’re trying to open it as cp1252 instead of UTF-8.
What software are you using to open this CSV?
- If you’re using Microsoft Excel, make sure to use “Import” instead of “Open” so that you can choose the encoding.
- With Google Sheets or LibreOffice it should Just Work.
Another possible explanation is that there’s something wrong with your data in the first place. Here’s how you can check that (I just took a few random characters from this generator):
df = pd.DataFrame(['勘してろむ説彼ふて惑岐とや尊続セヲ狭題'])
df.to_csv('report.csv', encoding='utf-8')
Try opening that the same way. If it opens correctly but the other doesn’t, the problem is in your code.
For me utf_8_sig worked like a charm.
df.to_csv("..ReportReport.csv", encoding='utf_8_sig')
I’m trying to write data with Japanese characters to file CSV.
But CSV’s not correct Japanese characters
def write_csv(columns, data):
df = pd.DataFrame(data, columns=columns)
df.to_csv("..ReportReport.csv", encoding='utf-8')
write_csv(["法人番号", "法人名称", "法人名称カナ"], [])
and CSV:
æ³•äººç•ªå· æ³•äººå称 法人å称カナ
How can I accomplish this?
Your code is OK, just tried it. I’m guessing the CSV file is good but you’re trying to open it as cp1252 instead of UTF-8.
What software are you using to open this CSV?
- If you’re using Microsoft Excel, make sure to use “Import” instead of “Open” so that you can choose the encoding.
- With Google Sheets or LibreOffice it should Just Work.
Another possible explanation is that there’s something wrong with your data in the first place. Here’s how you can check that (I just took a few random characters from this generator):
df = pd.DataFrame(['勘してろむ説彼ふて惑岐とや尊続セヲ狭題'])
df.to_csv('report.csv', encoding='utf-8')
Try opening that the same way. If it opens correctly but the other doesn’t, the problem is in your code.
For me utf_8_sig worked like a charm.
df.to_csv("..ReportReport.csv", encoding='utf_8_sig')