Setting exact number of iterations for Logistic regression in python

Question:

I’m creating a model to perform Logistic regression on a dataset using Python. This is my code:

from sklearn import linear_model
my_classifier2=linear_model.LogisticRegression(solver='lbfgs',max_iter=10000)

Now, according to Sklearn doc page, max_iter is maximum number of iterations taken for the solvers to converge. How do I specifically state that I need ‘N’ number of iterations ?

Any kind of help would be really appreciated.

Asked By: Rohan Dsouza

||

Answers:

I’m not sure, but, Do you want to know the optimal number of iterations for your model? If so, you are better off utilizing GridSearchCV that scan tune hyper parameter like max_iter.
Briefly,

  1. Split your data into two groups: train/test data with train_test_split or KFold that can be imported from sklean
  2. Set your parameter, for instance para=[{‘max_iter’:[1,10,100,100]}]
  3. Instance, for example clf=GridSearchCV(LogisticRegression, param_grid=para, cv=5, scoring=‘r2’)
  4. Implement with using train data like this: clf.fit(x_train, y_train)

You can also fetch the best number of iterations with RandomizedSearchCV or BayesianOptimization.

Answered By: Genzo Ito

About the GridSearchCV of the max_iter parameter, the fitted LogisticRegression models have and attribute n_iter_ so you can discover the exact max_iter needed for a given sample size and regarding features:

n_iter_: ndarray of shape (n_classes,) or (1, )

Actual number of iterations for all classes. If binary or multinomial, it
returns only 1 element. For liblinear solver, only the maximum number of
iteration across all classes is given.

Scanning very short intervals, like 1 by 1, is a waste of resources that could be used for more important LogisticRegression fit parameters such as the combination of solver itself, its regularization penalty and the inverse of the regularization strength C which contributes for a faster convergence within a given max_iter.

Setting a very high max_iter could be also a waste of resources if you haven’t previously did a minimal feature preprocessing, at least, feature scaling or maybe imputation, outlier clipping and a dimensionality reduction (e.g. PCA).

Things can become worse: a tunned max_iter could be ok for a given sample size but not for a bigger sample size, for instance, if you are developing a cross-validated learning curve, which by the way is imperative for optimal machine learning.

It becomes even worse if you increase a sample size in a pipeline that generates feature vectors such as n-grams (NLP): more rows will generate more (sparse) features for the LogisticRegression classification.

I think it’s important to observe if different solvers converges or not on given sample size, generated features and max_iter.

Methods that help a faster convergence which eventually won’t demand increasing max_iter are:

  • Feature scaling
  • Dimensionality Reduction (e.g. PCA) of scaled features

There’s a nice sklearn example demonstrating the importance of feature scaling

Answered By: Maurício Collaça