Use None instead of np.nan for null values in pandas DataFrame
Question:
I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.
In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.
df = pd.DataFrame(index=[0], columns=range(5))
df.iloc[0] = [1, 'two', np.nan, 3, 4]
I can’t do:
>>> df.fillna(None)
ValueError: must specify a fill method or value
nor:
>>> df[df.isnull()] = None
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
nor:
>>> df.replace(np.nan, None)
TypeError: cannot replace [nan] with method pad on a DataFrame
I used to have a DataFrame with only string values, so I could do:
>>> df[df == ""] = None
which worked. But now that I have mixed datatypes, it’s a no go.
For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I’m using np.isnan or pd.isnull everywhere?
Answers:
Use pd.DataFrame.where
Uses df
value when condition is met, otherwise uses None
df.where(df.notnull(), None)
Expanding on the accpeted answer.. When you also need to catch NaN
values within numeric dtype columns, you may need to change dtype to object
first:
df.astype(object).where(df.notna(), None)
as per original reply by @BENNY
Found this helpful for replacing NaN values with None: df.replace({pd.np.nan: None})
I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.
In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.
df = pd.DataFrame(index=[0], columns=range(5))
df.iloc[0] = [1, 'two', np.nan, 3, 4]
I can’t do:
>>> df.fillna(None)
ValueError: must specify a fill method or value
nor:
>>> df[df.isnull()] = None
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
nor:
>>> df.replace(np.nan, None)
TypeError: cannot replace [nan] with method pad on a DataFrame
I used to have a DataFrame with only string values, so I could do:
>>> df[df == ""] = None
which worked. But now that I have mixed datatypes, it’s a no go.
For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I’m using np.isnan or pd.isnull everywhere?
Use pd.DataFrame.where
Uses df
value when condition is met, otherwise uses None
df.where(df.notnull(), None)
Expanding on the accpeted answer.. When you also need to catch NaN
values within numeric dtype columns, you may need to change dtype to object
first:
df.astype(object).where(df.notna(), None)
as per original reply by @BENNY
Found this helpful for replacing NaN values with None: df.replace({pd.np.nan: None})