Pandas dataframe describe gives nan value for mean & std for float data type
Question:
I have a dataframe with 8M rows, when I use df.describe() or np.mean(df[col]) or df[col].mean()
I get nan
as output.
But, when I check np.mean(df[col].values)
, It is working. I can able to get the mean value.
There are no nan
values in that column. I have tested using df[col].isna().sum() and df[col].isnull().sum()
Not sure how to reproduce the bug.
Update:
>>> df.head()
col1 col2
0 2.289062 290
1 2.289062 290
2 2.289062 290
3 2.289062 290
4 2.289062 290
>>> df[col1].dtype
dtype('float16')
Is there a way to debug or resolve this error?
>>> pd.__version__
'1.3.4'
Answers:
I think you need convert column to float64/float32
, because no hardware support for float16 on a typical processor
:
df[col].astype('float64').mean()
df[col].astype('float32').mean()
I have a dataframe with 8M rows, when I use df.describe() or np.mean(df[col]) or df[col].mean()
I get nan
as output.
But, when I check np.mean(df[col].values)
, It is working. I can able to get the mean value.
There are no nan
values in that column. I have tested using df[col].isna().sum() and df[col].isnull().sum()
Not sure how to reproduce the bug.
Update:
>>> df.head()
col1 col2
0 2.289062 290
1 2.289062 290
2 2.289062 290
3 2.289062 290
4 2.289062 290
>>> df[col1].dtype
dtype('float16')
Is there a way to debug or resolve this error?
>>> pd.__version__
'1.3.4'
I think you need convert column to float64/float32
, because no hardware support for float16 on a typical processor
:
df[col].astype('float64').mean()
df[col].astype('float32').mean()