How to perform stratifiedShuffleSplit in GridSearchCV?

Question:

Can I run StraitifiedShuffleSplit inside GridSearchCV without having to instantiate it first as “ss” in case of my code.

ss = StratifiedShuffleSplit(n_splits=3, test_size=0.5, random_state=0)

grid_search = GridSearchCV(clf_us, param_grid = {parameter: num_range},cv=ss)
Asked By: user9238790

||

Answers:

If you are building a classifier and are only concerned with keeping the same label balance in each fold as in the complete data set, you can avoid instantiating StratifiedShuffleSplit by specifying the number of folds in GridSearchCV, e.g. cv=5.

According to the documentation: “For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.”
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

However, if you want to have a finer control over the data splitting then you can’t avoid instantiating StratifiedShuffleSplit. Please see the example in this page to understand how the test_size parameter affects the splitting: http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html#sklearn.model_selection.ShuffleSplit .

Answered By: KRKirov
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.