Error during Recursive feature elimination using Histogram based GBM
Question:
I am implementing Recursive Feature Elimination using the HistGradientBoostingClassifier, but for some reason keeps on getting the following error:
ValueError: when importance_getter=='auto'
, the underlying estimator HistGradientBoostingClassifier should have coef_
or feature_importances_
attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.feature_selection import RFECV
from sklearn.model_selection import cross_val_score, RepeatedStratifiedKFold
from sklearn.datasets import make_classification
X_train, y_train = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Create a HistGradientBoostingClassifier estimator
estimator = HistGradientBoostingClassifier().fit(X_train, y_train)
# Create a feature selector object using SelectFromModel
# Create a recursive feature elimination with cross-validation object
rfecv = RFECV(estimator=estimator, step=1, cv=RepeatedStratifiedKFold(n_splits=5, n_repeats=1),
scoring='roc_auc')
# Fit the recursive feature elimination object to the data
rfecv.fit(X_train, y_train)
# Print the selected features and their ranks
print("Selected Features: ", X_train.columns[rfecv.support_])
print("Feature Rankings: ", rfecv.ranking_)
Answers:
As the error message indicates, HistGradientBoostingClassifier
doesn’t have a coef_
or feature_importances_
attribute (even after fitting). There’s a github Issue discussing that, but at the moment the core devs are more wary of misleading importance scores than the convenience they would provide.
RFECV
allows using a callable as importance_getter
, which gets passed the fitted model and needs to return importances. So you could use permutation importance or some other custom importance that way. You may be able to copy or recreate impurity reduction from the older gradient boosting implementation this way.
I am implementing Recursive Feature Elimination using the HistGradientBoostingClassifier, but for some reason keeps on getting the following error:
ValueError: when importance_getter=='auto'
, the underlying estimator HistGradientBoostingClassifier should have coef_
or feature_importances_
attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.feature_selection import RFECV
from sklearn.model_selection import cross_val_score, RepeatedStratifiedKFold
from sklearn.datasets import make_classification
X_train, y_train = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Create a HistGradientBoostingClassifier estimator
estimator = HistGradientBoostingClassifier().fit(X_train, y_train)
# Create a feature selector object using SelectFromModel
# Create a recursive feature elimination with cross-validation object
rfecv = RFECV(estimator=estimator, step=1, cv=RepeatedStratifiedKFold(n_splits=5, n_repeats=1),
scoring='roc_auc')
# Fit the recursive feature elimination object to the data
rfecv.fit(X_train, y_train)
# Print the selected features and their ranks
print("Selected Features: ", X_train.columns[rfecv.support_])
print("Feature Rankings: ", rfecv.ranking_)
As the error message indicates, HistGradientBoostingClassifier
doesn’t have a coef_
or feature_importances_
attribute (even after fitting). There’s a github Issue discussing that, but at the moment the core devs are more wary of misleading importance scores than the convenience they would provide.
RFECV
allows using a callable as importance_getter
, which gets passed the fitted model and needs to return importances. So you could use permutation importance or some other custom importance that way. You may be able to copy or recreate impurity reduction from the older gradient boosting implementation this way.