How to proceed with `None` value in pandas fillna
Question:
I have the following dictionary:
fillna(value={'first_name':'Andrii', 'last_name':'Furmanets', 'created_at':None})
When I pass that dictionary to fillna
I see:
raise ValueError(‘must specify a fill method or value’)nValueError: must specify a fill method or valuen”
It seems to me that it fails on None
value.
I use pandas version 0.20.3.
Answers:
What type of data structure are you using? This works for a pandas Series:
import pandas as pd
d = pd.Series({'first_name': 'Andrii', 'last_name':'Furmanets', 'created_at':None})
d = d.fillna('DATE')
Setup
Consider the sample dataframe df
df = pd.DataFrame(dict(A=[1, None], B=[None, 2], C=[None, 'D']))
df
A B C
0 1.0 NaN None
1 NaN 2.0 D
I can confirm the error
df.fillna(dict(A=1, B=None, C=4))
ValueError: must specify a fill method or value
This happens because pandas is cycling through keys in the dictionary and executing a fillna
for each relevant column. If you look at the signature of the pd.Series.fillna
method
Series.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
You’ll see the default value is None
. So we can replicate this error with
df.A.fillna(None)
Or equivalently
df.A.fillna()
I’ll add that I’m not terribly surprised considering that you are attempting to fill a null value with a null value.
What you need is a work around
Solution
Use pd.DataFrame.fillna
over columns that you want to fill with non-null values. Then follow that up with a pd.DataFrame.replace
on the specific columns you want to swap one null value with another.
df.fillna(dict(A=1, C=2)).replace(dict(B={np.nan: None}))
A B C
0 1.0 None 2
1 1.0 2 D
An alternative method to fillna with None
. I am on pandas 0.24.0
and I am doing this to insert NULL values to POSTGRES database.
# Stealing @pIRSquared dataframe
df = pd.DataFrame(dict(A=[1, None], B=[None, 2], C=[None, 'D']))
df
A B C
0 1.0 NaN None
1 NaN 2.0 D
# fill NaN with None. Basically it says, fill with None whenever you see NULL value.
df['A'] = np.where(df['A'].isnull(), None, df['A'])
df['B'] = np.where(df['B'].isnull(), None, df['B'])
# Result
df
A B C
0 1.0 None None
1 None 2.0 D
It’s a bad idea to try to fill a datetime with None
, this is exactly what pandas NaT
(NotATime), is for: for missing datetimes.
In case you want to normalize all of the nulls with python’s None.
df.fillna(np.nan).replace([np.nan], [None])
The first fillna
will replace all of (None, NAT, np.nan, etc) with Numpy’s NaN, then replace Numpy’s NaN with python’s None.
I have the following dictionary:
fillna(value={'first_name':'Andrii', 'last_name':'Furmanets', 'created_at':None})
When I pass that dictionary to fillna
I see:
raise ValueError(‘must specify a fill method or value’)nValueError: must specify a fill method or valuen”
It seems to me that it fails on None
value.
I use pandas version 0.20.3.
What type of data structure are you using? This works for a pandas Series:
import pandas as pd
d = pd.Series({'first_name': 'Andrii', 'last_name':'Furmanets', 'created_at':None})
d = d.fillna('DATE')
Setup
Consider the sample dataframe df
df = pd.DataFrame(dict(A=[1, None], B=[None, 2], C=[None, 'D']))
df
A B C
0 1.0 NaN None
1 NaN 2.0 D
I can confirm the error
df.fillna(dict(A=1, B=None, C=4))
ValueError: must specify a fill method or value
This happens because pandas is cycling through keys in the dictionary and executing a fillna
for each relevant column. If you look at the signature of the pd.Series.fillna
method
Series.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
You’ll see the default value is None
. So we can replicate this error with
df.A.fillna(None)
Or equivalently
df.A.fillna()
I’ll add that I’m not terribly surprised considering that you are attempting to fill a null value with a null value.
What you need is a work around
Solution
Use pd.DataFrame.fillna
over columns that you want to fill with non-null values. Then follow that up with a pd.DataFrame.replace
on the specific columns you want to swap one null value with another.
df.fillna(dict(A=1, C=2)).replace(dict(B={np.nan: None}))
A B C
0 1.0 None 2
1 1.0 2 D
An alternative method to fillna with None
. I am on pandas 0.24.0
and I am doing this to insert NULL values to POSTGRES database.
# Stealing @pIRSquared dataframe
df = pd.DataFrame(dict(A=[1, None], B=[None, 2], C=[None, 'D']))
df
A B C
0 1.0 NaN None
1 NaN 2.0 D
# fill NaN with None. Basically it says, fill with None whenever you see NULL value.
df['A'] = np.where(df['A'].isnull(), None, df['A'])
df['B'] = np.where(df['B'].isnull(), None, df['B'])
# Result
df
A B C
0 1.0 None None
1 None 2.0 D
It’s a bad idea to try to fill a datetime with None
, this is exactly what pandas NaT
(NotATime), is for: for missing datetimes.
In case you want to normalize all of the nulls with python’s None.
df.fillna(np.nan).replace([np.nan], [None])
The first fillna
will replace all of (None, NAT, np.nan, etc) with Numpy’s NaN, then replace Numpy’s NaN with python’s None.