Use None instead of np.nan for null values in pandas DataFrame

Question:

I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.

In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.

df = pd.DataFrame(index=[0], columns=range(5))
df.iloc[0] = [1, 'two', np.nan, 3, 4] 

I can’t do:

>>> df.fillna(None)
ValueError: must specify a fill method or value

nor:

>>> df[df.isnull()] = None
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

nor:

>>> df.replace(np.nan, None)
TypeError: cannot replace [nan] with method pad on a DataFrame

I used to have a DataFrame with only string values, so I could do:

>>> df[df == ""] = None

which worked. But now that I have mixed datatypes, it’s a no go.

For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I’m using np.isnan or pd.isnull everywhere?

Asked By: J Jones

||

Answers:

Use pd.DataFrame.where
Uses df value when condition is met, otherwise uses None

df.where(df.notnull(), None)

enter image description here

Answered By: piRSquared

Expanding on the accpeted answer.. When you also need to catch NaN values within numeric dtype columns, you may need to change dtype to object first:

df.astype(object).where(df.notna(), None)

as per original reply by @BENNY

Answered By: KingOtto

Found this helpful for replacing NaN values with None: df.replace({pd.np.nan: None})

Answered By: apinanyogaratnam
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.