Consistent answer to sci-kit learn GridSearchCV

Question:

How do I get a consistent answer using GridSearchCV in sci-kit learn? I assume I’m getting different answers b/c different random numbers are causing the folds to be different each time I run it, though it is my understanding that the below code should solve this as KFold has shuffle=False by default.

clf = GridSearchCV(SVC(), param_grid, cv=KFold(n, n_folds=10))
Asked By: user1507844

||

Answers:

As you identified in the comments, predict_proba is NOT deterministic!

But it does accept a random_state (as does KFold). I’ve found before that setting shuffle=False can lead to really poor results if your data were collected in a non-random order, so IMHO you’re better off using shuffle and setting random_state to some number.

From the docs

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)

random_state : int seed, RandomState instance, or None (default)

The seed of the pseudo random number generator to use when shuffling the data for probability estimation.

Answered By: Alex

I think you’re looking for this parameter: random_state=7

Most things that have a random_state parameter leave it at None, which allows variation.

You must set it to some number to get consistent results.

I set it to 7 because I like 7. Pick any number.

Answered By: James Madison

For GridSearchCV, it’s also important to set numpy.random.seed to get reproducible results. This must be done in addition to setting random_state or seed in the estimator.

numpy.random.seed(42)
Answered By: Tim