Get pandas.read_csv to read empty values as empty string instead of nan
Question:
I’m using the pandas library to read in some CSV data. In my data, certain columns contain strings. The string "nan"
is a possible value, as is an empty string. I managed to get pandas to read “nan” as a string, but I can’t figure out how to get it not to read an empty value as NaN. Here’s sample data and output
One,Two,Three
a,1,one
b,2,two
,3,three
d,4,nan
e,5,five
nan,6,
g,7,seven
>>> pandas.read_csv('test.csv', na_values={'One': [], "Three": []})
One Two Three
0 a 1 one
1 b 2 two
2 NaN 3 three
3 d 4 nan
4 e 5 five
5 nan 6 NaN
6 g 7 seven
It correctly reads “nan” as the string “nan’, but still reads the empty cells as NaN. I tried passing in str
in the converters
argument to read_csv (with converters={'One': str})
), but it still reads the empty cells as NaN.
I realize I can fill the values after reading, with fillna, but is there really no way to tell pandas that an empty cell in a particular CSV column should be read as an empty string instead of NaN?
Answers:
I added a ticket to add an option of some sort here:
https://github.com/pydata/pandas/issues/1450
In the meantime, result.fillna('')
should do what you want
EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values
, empty strings will stay empty strings in the result
I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.
Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False
:
pd.read_csv('test.csv', keep_default_na=False)
This issue is more clearly explained in
That was fixed on on Aug 19, 2012 for Pandas version 0.9 in
We have a simple argument in Pandas read_csv()
for this:
Use:
df = pd.read_csv('test.csv', na_filter= False)
What pandas
defines by default as missing value while read_csv()
can be found here.
import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
print(default_missing)
The output
{'', '<NA>', 'nan', '1.#QNAN', 'NA', 'null', 'n/a', '-nan', '1.#IND', '#N/A N/A', 'N/A', 'NULL', 'NaN', '-1.#IND', '-1.#QNAN', '#NA', '#N/A', '-NaN'}
With that you can do an opt-out.
import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
default_missing = default_missing.remove('')
default_missing = default_missing.remove('na')
with open('test.csv', 'r') as csv_file:
pandas.read_csv(csv_file, na_values=default_missing)
If you want to keep the empty strings for just one column, define str
as the column converter (dtypes
won’t work):
pd.read_csv('test.csv', converters={'column_name': str})
pd.read_csv( sourceObj, dtype='string')
no additional parameters needed.
Each column type is python primitive string and empty values become empty string ”.
Version: Pandas v1.5
I’m using the pandas library to read in some CSV data. In my data, certain columns contain strings. The string "nan"
is a possible value, as is an empty string. I managed to get pandas to read “nan” as a string, but I can’t figure out how to get it not to read an empty value as NaN. Here’s sample data and output
One,Two,Three
a,1,one
b,2,two
,3,three
d,4,nan
e,5,five
nan,6,
g,7,seven
>>> pandas.read_csv('test.csv', na_values={'One': [], "Three": []})
One Two Three
0 a 1 one
1 b 2 two
2 NaN 3 three
3 d 4 nan
4 e 5 five
5 nan 6 NaN
6 g 7 seven
It correctly reads “nan” as the string “nan’, but still reads the empty cells as NaN. I tried passing in str
in the converters
argument to read_csv (with converters={'One': str})
), but it still reads the empty cells as NaN.
I realize I can fill the values after reading, with fillna, but is there really no way to tell pandas that an empty cell in a particular CSV column should be read as an empty string instead of NaN?
I added a ticket to add an option of some sort here:
https://github.com/pydata/pandas/issues/1450
In the meantime, result.fillna('')
should do what you want
EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values
, empty strings will stay empty strings in the result
I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.
Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False
:
pd.read_csv('test.csv', keep_default_na=False)
This issue is more clearly explained in
That was fixed on on Aug 19, 2012 for Pandas version 0.9 in
We have a simple argument in Pandas read_csv()
for this:
Use:
df = pd.read_csv('test.csv', na_filter= False)
What pandas
defines by default as missing value while read_csv()
can be found here.
import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
print(default_missing)
The output
{'', '<NA>', 'nan', '1.#QNAN', 'NA', 'null', 'n/a', '-nan', '1.#IND', '#N/A N/A', 'N/A', 'NULL', 'NaN', '-1.#IND', '-1.#QNAN', '#NA', '#N/A', '-NaN'}
With that you can do an opt-out.
import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
default_missing = default_missing.remove('')
default_missing = default_missing.remove('na')
with open('test.csv', 'r') as csv_file:
pandas.read_csv(csv_file, na_values=default_missing)
If you want to keep the empty strings for just one column, define str
as the column converter (dtypes
won’t work):
pd.read_csv('test.csv', converters={'column_name': str})
pd.read_csv( sourceObj, dtype='string')
no additional parameters needed.
Each column type is python primitive string and empty values become empty string ”.
Version: Pandas v1.5