How to use GridSearchCV output for a scikit prediction?

Question:

In the following code:

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

rf_feature_imp = RandomForestClassifier(100)
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5)

clf = RandomForestClassifier(5000)

model = Pipeline([
          ('fs', feat_selection), 
          ('clf', clf), 
        ])

 params = {
    'fs__threshold': [0.5, 0.3, 0.7],
    'fs__estimator__max_features': ['auto', 'sqrt', 'log2'],
    'clf__max_features': ['auto', 'sqrt', 'log2'],
 }

 gs = GridSearchCV(model, params, ...)
 gs.fit(X,y)

What should be used for a prediction?

  • gs?
  • gs.best_estimator_?
    or
  • gs.best_estimator_.named_steps['clf']?

What is the difference between these 3?

Asked By: user308827

||

Answers:

gs.predict(X_test) is equivalent to gs.best_estimator_.predict(X_test). Using either, X_test will be passed through your entire pipeline and it will return the predictions.

gs.best_estimator_.named_steps['clf'].predict(), however is only the last phase of the pipeline. To use it, the feature selection step must already have been performed. This would only work if you have previously run your data through gs.best_estimator_.named_steps['fs'].transform()

Three equivalent methods for generating predictions are shown below:

Using gs directly.

pred = gs.predict(X_test)

Using best_estimator_.

pred = gs.best_estimator_.predict(X_test)

Calling each step in the pipeline individual.

X_test_fs = gs.best_estimator_.named_steps['fs'].transform(X_test)
pred = gs.best_estimator_.named_steps['clf'].predict(X_test_fs)
Answered By: David Maust

If you pass True to the value of refit parameter of GridSearchCV (which is the default value anyway), then the estimator with best parameters refits on the whole dataset, so you can use gs.fit(X_test) for prediction.
If the value of refit is equal to False while fitting the GridSearchCV object on your training set, then for prediction, you have only one option which is using gs.best_estimator_.predict(X_test).

Answered By: absurdlyhard
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.