Why does LogisticRegressionCV's .score() differ from cross_val_score?

Question:

I was using LogisticRegressionCV’s .score() method to yield an accuracy score for my model.

I also used cross_val_score to yield an accuracy score with the same cv split (skf), expecting the same score to show up.

But alas, they were different and I’m confused.

I first did a StratifiedKFold:

skf = StratifiedKFold(n_splits = 5,
                      shuffle = True,
                      random_state = 708)

After which I instantiated a LogisticRegressionCV() with the skf as an argument for the CV parameter, fitted, and scored on the training set.

logreg = LogisticRegressionCV(cv=skf, solver='liblinear')

logreg.fit(X_train_sc, y_train)
logreg.score(X_train_sc, y_train)

This gave me a score of 0.849507735583685, which was accuracy by default. Since this is LogisticRegressionCV, this score is actually the mean accuracy score right?

Then I used cross_val_score:

cross_val_score(logreg, X_train_sc, y_train, cv=skf).mean()

This gave me a mean accuracy score of 0.8227814439082044.

I’m kind of confused as to why the scores differ, since I thought I was basically doing the same thing.

Asked By: Orangecat

||

Answers:

[.score] is actually the mean accuracy score right?

No. The score method here is the accuracy score of the final classifier (which was retrained on the entire training set, using the optimal value of the regularization strength). By evaluating it on the training set again, you’re getting an optimistically-biased estimate of future performance.

To recover the cross-validation scores, you can use the attribute scores_. Even with the same folds, these may be slightly different from cross_val_score due to randomness in the solver, if it doesn’t converge completely.

Answered By: Ben Reiniger

To add to the answer, you can change the behaviour of your code by simply adding "refit = False" as a parameter to LogisticRegressionCV(), e.g.

logreg = LogisticRegressionCV(cv=skf, solver='liblinear', refit=False)

the rest you can keep the same.

Answered By: Marvasti