Pandas fillna throws ValueError: fill value must be in categories
Question:
Discription: both features are in categorical dtypes. and i used this code in a different kernal of same
dateset was working fine, the only difference is the features are in flote64. later i have converted these feature dtypes into Categorical
because all the features in the dataset represents categories.
Below is the code:
AM_train['product_category_2'].fillna('Unknown', inplace =True)
AM_train['city_development_index'].fillna('Missing', inplace =True)
Answers:
Use Series.cat.add_categories
for add categories first:
AM_train['product_category_2'] = AM_train['product_category_2'].cat.add_categories('Unknown')
AM_train['product_category_2'].fillna('Unknown', inplace =True)
AM_train['city_development_index'] = AM_train['city_development_index'].cat.add_categories('Missing')
AM_train['city_development_index'].fillna('Missing', inplace =True)
Sample:
AM_train = pd.DataFrame({'product_category_2': pd.Categorical(['a','b',np.nan])})
AM_train['product_category_2'] = AM_train['product_category_2'].cat.add_categories('Unknown')
AM_train['product_category_2'].fillna('Unknown', inplace =True)
print (AM_train)
product_category_2
0 a
1 b
2 Unknown
I was getting the same error in a data frame while trying to get rid of all the NaNs.
I did not look too much into it, but substituting .fillna()
for .replace(np.nan, value)
did the trick.
Use with caution, since I am not sure np.nan
catches all the values that are interpreted as NaN
In my case, I was using fillna on a dataframe with many features when I got that error.
I preferred converting the necessary features to string first, using fillna and finally converting them back to category if needed.
AM_train['product_category_2'] = AM_train['product_category_2'].astype('string')
AM_train['product_category_2'].fillna('Unknown', inplace =True)
AM_train['product_category_2'] = AM_train['product_category_2'].astype('category')
It could also be automated, searching for all features having a dtype ‘category’ and converting them using the logic above.
Load the original dataset without inplace=True, always before running the fillna secondtime.
This problem arises because, you run the code twice, so fillna cannot be performed.
Discription: both features are in categorical dtypes. and i used this code in a different kernal of same
dateset was working fine, the only difference is the features are in flote64. later i have converted these feature dtypes into Categorical
because all the features in the dataset represents categories.
Below is the code:
AM_train['product_category_2'].fillna('Unknown', inplace =True)
AM_train['city_development_index'].fillna('Missing', inplace =True)
Use Series.cat.add_categories
for add categories first:
AM_train['product_category_2'] = AM_train['product_category_2'].cat.add_categories('Unknown')
AM_train['product_category_2'].fillna('Unknown', inplace =True)
AM_train['city_development_index'] = AM_train['city_development_index'].cat.add_categories('Missing')
AM_train['city_development_index'].fillna('Missing', inplace =True)
Sample:
AM_train = pd.DataFrame({'product_category_2': pd.Categorical(['a','b',np.nan])})
AM_train['product_category_2'] = AM_train['product_category_2'].cat.add_categories('Unknown')
AM_train['product_category_2'].fillna('Unknown', inplace =True)
print (AM_train)
product_category_2
0 a
1 b
2 Unknown
I was getting the same error in a data frame while trying to get rid of all the NaNs.
I did not look too much into it, but substituting .fillna()
for .replace(np.nan, value)
did the trick.
Use with caution, since I am not sure np.nan
catches all the values that are interpreted as NaN
In my case, I was using fillna on a dataframe with many features when I got that error.
I preferred converting the necessary features to string first, using fillna and finally converting them back to category if needed.
AM_train['product_category_2'] = AM_train['product_category_2'].astype('string')
AM_train['product_category_2'].fillna('Unknown', inplace =True)
AM_train['product_category_2'] = AM_train['product_category_2'].astype('category')
It could also be automated, searching for all features having a dtype ‘category’ and converting them using the logic above.
Load the original dataset without inplace=True, always before running the fillna secondtime.
This problem arises because, you run the code twice, so fillna cannot be performed.