Imputing NaNs using pandas's fillna() changes the dtype from float to object

Question:

So I am imputing some of my columns for the missing values. The columns were in numerical dtypes (float and integer)
As soon as I impute the missing values using fillna() with mean etc, the column’s dtype is changed from float to object.
I wanted it to remain float. And find it a little inefficient to redo all dtypes.
Kindly help me with this.

Here is an example.

ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)

ser_imputed = ser_original.fillna(np.mean)
print('After imputation, the dtype is {}'.format(ser_imputed.dtype))

After imputation, the dtype is dtype(‘O’)

Please note that this is just a sample example I created here. I am working with a large datasets and have planned to impute multiple columns with different imputations. So please suggest the solution that helps handling multiple columns at once.

P.S. I find deploying for loops to be a little naive. Do comment if I am incorrect here.

Asked By: letdatado

||

Answers:

That’s because you’re using a function rather than values

ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)
ser_imputed = ser_original.fillna(np.mean)
print(ser_imputed)
0                                      1.0
1                                      2.0
2    <function mean at 0x000002BCA05020D0>
3                                      4.0
4                                      5.0
dtype: object

Use the mean instead and it works fine

ser_imputed = ser_original.fillna(ser_original.mean())
print(ser_imputed)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64
print(ser_imputed.dtype)
# dtype('float64')

If you have a dataframe, you can fill in NaNs in it by using fillna() as

df.fillna(df.mean())

where each column’s NaN will be replaced by the mean of that column.

Answered By: not a robot
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.