CSV file with Arabic characters is displayed as symbols in Excel
Question:
I am using python to extract Arabic tweets from twitter and save it as a CSV file, but when I open the saved file in excel the Arabic language displays as symbols. However, inside python, notepad, or word, it looks good.
May I know where is the problem?
Answers:
Excel is known to have an awful csv import sytem. Long story short if on same system you import a csv file that you have just exported, it will work smoothly. Else, the csv file is expected to use the Windows system encoding and delimiter.
A rather awkward but robust system is to use LibreOffice or Oracle OpenOffice. Both are far beyond Excel on any feature but the csv module: they will allow you to specify the delimiters and optional quoting characters along with the encoding of the csv file and you will be able to save the resulting file in xslx.
This is a problem I face frequently with Microsoft Excel when opening CSV files that contain Arabic characters. Try the following workaround that I tested on latest versions of Microsoft Excel on both Windows and MacOS:
-
Open Excel on a blank workbook
-
Within the Data tab, click on From Text button (if not
activated, make sure an empty cell is selected)
-
Browse and select the CSV file
-
In the Text Import Wizard, change the File_origin to “Unicode (UTF-8)“
-
Go next and from the Delimiters, select the delimiter used in your file e.g. comma
-
Finish and select where to import the data
The Arabic characters should show correctly.
The only solution that i’ve found to save arabic into an excel file from python is to use pandas and to save into the xlsx extension instead of csv, xlsx seems a million times better here’s the code i’ve put together which worked for me
import pandas as pd
def turn_into_csv(data, csver):
ids = []
texts = []
for each in data:
texts.append(each["full_text"])
ids.append(str(each["id"]))
df = pd.DataFrame({'ID': ids, 'FULL_TEXT': texts})
writer = pd.ExcelWriter(csver + '.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', encoding="utf-8-sig")
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Just use encoding=’utf-8-sig’ instead of encoding=’utf-8′ as follows:
import csv
data = u"اردو"
with(open('example.csv', 'w', encoding='utf-8-sig')) as fh:
writer = csv.writer(fh)
writer.writerow([data])
It worked on my machine.
Although my CSV file encoding was UTF-8
; but explicitly redoing it again using the Notepad resolved it.
Steps:
- Open your CSV file in Notepad.
- Click File –> Save as…
- In the "Encoding" drop-down, select UTF-8.
- Rename your file using the .csv extension.
- Click Save.
- Reopen the file with Excel.
Fastest way is after saving the file into .csv from python:
- open the .csv file using Notepad++
- from Encoding drop-down menu choose UTF-8-BOM
- click save as and save at with same name with .csv extension (e.g. data.csv) and keep the file type as it is .txt
- re-open the file again with Microsoft Excel.
I am using python to extract Arabic tweets from twitter and save it as a CSV file, but when I open the saved file in excel the Arabic language displays as symbols. However, inside python, notepad, or word, it looks good.
May I know where is the problem?
Excel is known to have an awful csv import sytem. Long story short if on same system you import a csv file that you have just exported, it will work smoothly. Else, the csv file is expected to use the Windows system encoding and delimiter.
A rather awkward but robust system is to use LibreOffice or Oracle OpenOffice. Both are far beyond Excel on any feature but the csv module: they will allow you to specify the delimiters and optional quoting characters along with the encoding of the csv file and you will be able to save the resulting file in xslx.
This is a problem I face frequently with Microsoft Excel when opening CSV files that contain Arabic characters. Try the following workaround that I tested on latest versions of Microsoft Excel on both Windows and MacOS:
-
Open Excel on a blank workbook
-
Within the Data tab, click on From Text button (if not
activated, make sure an empty cell is selected) -
Browse and select the CSV file
-
In the Text Import Wizard, change the File_origin to “Unicode (UTF-8)“
-
Go next and from the Delimiters, select the delimiter used in your file e.g. comma
-
Finish and select where to import the data
The Arabic characters should show correctly.
The only solution that i’ve found to save arabic into an excel file from python is to use pandas and to save into the xlsx extension instead of csv, xlsx seems a million times better here’s the code i’ve put together which worked for me
import pandas as pd
def turn_into_csv(data, csver):
ids = []
texts = []
for each in data:
texts.append(each["full_text"])
ids.append(str(each["id"]))
df = pd.DataFrame({'ID': ids, 'FULL_TEXT': texts})
writer = pd.ExcelWriter(csver + '.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', encoding="utf-8-sig")
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Just use encoding=’utf-8-sig’ instead of encoding=’utf-8′ as follows:
import csv
data = u"اردو"
with(open('example.csv', 'w', encoding='utf-8-sig')) as fh:
writer = csv.writer(fh)
writer.writerow([data])
It worked on my machine.
Although my CSV file encoding was UTF-8
; but explicitly redoing it again using the Notepad resolved it.
Steps:
- Open your CSV file in Notepad.
- Click File –> Save as…
- In the "Encoding" drop-down, select UTF-8.
- Rename your file using the .csv extension.
- Click Save.
- Reopen the file with Excel.
Fastest way is after saving the file into .csv from python:
- open the .csv file using Notepad++
- from Encoding drop-down menu choose UTF-8-BOM
- click save as and save at with same name with .csv extension (e.g. data.csv) and keep the file type as it is .txt
- re-open the file again with Microsoft Excel.