Python Pandas read_excel dtype str replace nan by blank ('') when reading or when writing via to_csv
Question:
Python version: Python 2.7.13 :: Anaconda custom (64-bit)
Pandas version: pandas 0.20.2
Hello,
I have a quite simple requirement.
I would like to read an excel file and write a specific sheet to a csv file.
Blank values in the source Excel file should be treated / written as blank when writing the csv file.
However, my blank records are always written as ‘nan’ to the output file. (without the quotes)
I read the Excel file via method
read_excel(xlsx, sheetname=’sheet1′, dtype = str)
I am specifying dtype because I have some columns that are numbers but should be treated as string. (Otherwise they might lose leading 0s etc)
i.e. I would like to read the exact value from every cell.
Now I write the output .csv file via
to_csv(output_file,index=False,mode=’wb’,sep=’,’,encoding=’utf-8′)
However, my result csv file contains nan for all blank cells from the excel file.
What am I missing? I already tried .fillna(”, inplace=True) function but it seems to be doing nothing to my data.
I also tried to add parameter na_rep =” to the to_csv method but without success.
Thanks for any help!
Addendum: Please find hereafter a reproducible example.
Please find hereafter a reproducible example code.
Please first create a new Excel file with 2 columns with the following content:
COLUMNA COLUMNB COLUMNC
01 test
02 test
03 test
(I saved this Excel file to c:test.xls
Please note that 1st and 3rd row for column B as well as the 2nd row for Column C is blank/empty)
Now here is my code:
import pandas as pd
xlsx = pd.ExcelFile('c:\test.xlsx')
df = pd.read_excel(xlsx, sheetname='Sheet1', dtype = str)
df.fillna('', inplace=True)
df.to_csv('c:\test.csv', index=False,mode='wb',sep=',',encoding='utf-8', na_rep ='')
My result is:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test
My desired result would be:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test
Answers:
Since you are dealing with nan
strings, you can use the replace
function:
df = pd.DataFrame({'Col1' : ['nan', 'foo', 'bar', 'baz', 'nan', 'test']})
df.replace('nan', '')
Col1
0
1 foo
2 bar
3 baz
4
5 test
All 'nan'
string values will be replaced by the empty string ''
. replace
is not in-place, so make sure you assign it back:
df = df.replace('nan', '')
You can then write it to your file using to_csv
.
If you are actually looking to fill NaN values with blank, use fillna
:
df = df.fillna('')
I had the same problem and used the option keep_default_na=False in read_excel.
df = pd.read_excel(content, sheet_name=sheet, header=[0], keep_default_na=False)
I had the same issue, and the options for replace
or fillna
did not work. (df.info() showed that the fields where NaN was displayed were float64). The na_filter = False option
worked for me. It would work like this:
df = pd.read_excel("myexcelfile.xlsx", na_filter = False)
You can read more about the na_filter
option in the pandas read_excel
documentation.
Python version: Python 2.7.13 :: Anaconda custom (64-bit)
Pandas version: pandas 0.20.2
Hello,
I have a quite simple requirement.
I would like to read an excel file and write a specific sheet to a csv file.
Blank values in the source Excel file should be treated / written as blank when writing the csv file.
However, my blank records are always written as ‘nan’ to the output file. (without the quotes)
I read the Excel file via method
read_excel(xlsx, sheetname=’sheet1′, dtype = str)
I am specifying dtype because I have some columns that are numbers but should be treated as string. (Otherwise they might lose leading 0s etc)
i.e. I would like to read the exact value from every cell.
Now I write the output .csv file via
to_csv(output_file,index=False,mode=’wb’,sep=’,’,encoding=’utf-8′)
However, my result csv file contains nan for all blank cells from the excel file.
What am I missing? I already tried .fillna(”, inplace=True) function but it seems to be doing nothing to my data.
I also tried to add parameter na_rep =” to the to_csv method but without success.
Thanks for any help!
Addendum: Please find hereafter a reproducible example.
Please find hereafter a reproducible example code.
Please first create a new Excel file with 2 columns with the following content:
COLUMNA COLUMNB COLUMNC
01 test
02 test
03 test
(I saved this Excel file to c:test.xls
Please note that 1st and 3rd row for column B as well as the 2nd row for Column C is blank/empty)
Now here is my code:
import pandas as pd
xlsx = pd.ExcelFile('c:\test.xlsx')
df = pd.read_excel(xlsx, sheetname='Sheet1', dtype = str)
df.fillna('', inplace=True)
df.to_csv('c:\test.csv', index=False,mode='wb',sep=',',encoding='utf-8', na_rep ='')
My result is:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test
My desired result would be:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test
Since you are dealing with nan
strings, you can use the replace
function:
df = pd.DataFrame({'Col1' : ['nan', 'foo', 'bar', 'baz', 'nan', 'test']})
df.replace('nan', '')
Col1
0
1 foo
2 bar
3 baz
4
5 test
All 'nan'
string values will be replaced by the empty string ''
. replace
is not in-place, so make sure you assign it back:
df = df.replace('nan', '')
You can then write it to your file using to_csv
.
If you are actually looking to fill NaN values with blank, use fillna
:
df = df.fillna('')
I had the same problem and used the option keep_default_na=False in read_excel.
df = pd.read_excel(content, sheet_name=sheet, header=[0], keep_default_na=False)
I had the same issue, and the options for replace
or fillna
did not work. (df.info() showed that the fields where NaN was displayed were float64). The na_filter = False option
worked for me. It would work like this:
df = pd.read_excel("myexcelfile.xlsx", na_filter = False)
You can read more about the na_filter
option in the pandas read_excel
documentation.