Imputing NaNs using pandas's fillna() changes the dtype from float to object
Question:
So I am imputing some of my columns for the missing values. The columns were in numerical dtypes (float and integer)
As soon as I impute the missing values using fillna() with mean etc, the column’s dtype is changed from float to object.
I wanted it to remain float. And find it a little inefficient to redo all dtypes.
Kindly help me with this.
Here is an example.
ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)
ser_imputed = ser_original.fillna(np.mean)
print('After imputation, the dtype is {}'.format(ser_imputed.dtype))
After imputation, the dtype is dtype(‘O’)
Please note that this is just a sample example I created here. I am working with a large datasets and have planned to impute multiple columns with different imputations. So please suggest the solution that helps handling multiple columns at once.
P.S. I find deploying for loops to be a little naive. Do comment if I am incorrect here.
Answers:
That’s because you’re using a function rather than values
ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)
ser_imputed = ser_original.fillna(np.mean)
print(ser_imputed)
0 1.0
1 2.0
2 <function mean at 0x000002BCA05020D0>
3 4.0
4 5.0
dtype: object
Use the mean instead and it works fine
ser_imputed = ser_original.fillna(ser_original.mean())
print(ser_imputed)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
print(ser_imputed.dtype)
# dtype('float64')
If you have a dataframe, you can fill in NaNs in it by using fillna()
as
df.fillna(df.mean())
where each column’s NaN will be replaced by the mean of that column.
So I am imputing some of my columns for the missing values. The columns were in numerical dtypes (float and integer)
As soon as I impute the missing values using fillna() with mean etc, the column’s dtype is changed from float to object.
I wanted it to remain float. And find it a little inefficient to redo all dtypes.
Kindly help me with this.
Here is an example.
ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)
ser_imputed = ser_original.fillna(np.mean)
print('After imputation, the dtype is {}'.format(ser_imputed.dtype))
After imputation, the dtype is dtype(‘O’)
Please note that this is just a sample example I created here. I am working with a large datasets and have planned to impute multiple columns with different imputations. So please suggest the solution that helps handling multiple columns at once.
P.S. I find deploying for loops to be a little naive. Do comment if I am incorrect here.
That’s because you’re using a function rather than values
ser_original = pd.Series([1.0, 2.0, np.nan, 4.0, 5.0], dtype=float)
ser_imputed = ser_original.fillna(np.mean)
print(ser_imputed)
0 1.0
1 2.0
2 <function mean at 0x000002BCA05020D0>
3 4.0
4 5.0
dtype: object
Use the mean instead and it works fine
ser_imputed = ser_original.fillna(ser_original.mean())
print(ser_imputed)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
print(ser_imputed.dtype)
# dtype('float64')
If you have a dataframe, you can fill in NaNs in it by using fillna()
as
df.fillna(df.mean())
where each column’s NaN will be replaced by the mean of that column.