Does the SVM in sklearn support incremental (online) learning?

Question:

I am currently in the process of designing a recommender system for text articles (a binary case of ‘interesting’ or ‘not interesting’). One of my specifications is that it should continuously update to changing trends.

From what I can tell, the best way to do this is to make use of machine learning algorithm that supports incremental/online learning.

Algorithms like the Perceptron and Winnow support online learning but I am not completely certain about Support Vector Machines. Does the scikit-learn python library support online learning and if so, is a support vector machine one of the algorithms that can make use of it?

I am obviously not completely tied down to using support vector machines, but they are usually the go to algorithm for binary classification due to their all round performance. I would be willing to change to whatever fits best in the end.

Asked By: Michael Aquilina

||

Answers:

Technical aspects

The short answer is no. Sklearn implementation (as well as most of the existing others) do not support online SVM training. It is possible to train SVM in an incremental way, but it is not so trivial task.

If you want to limit yourself to the linear case, than the answer is yes, as sklearn provides you with Stochastic Gradient Descent (SGD), which has option to minimize the SVM criterion.

You can also try out pegasos library instead, which supports online SVM training.

Theoretical aspects

The problem of trend adaptation is currently very popular in ML community. As @Raff stated, it is called concept drift, and has numerous approaches, which are often kinds of meta models, which analyze “how the trend is behaving” and change the underlying ML model (by for example forcing it to retrain on the subset of the data). So you have two independent problems here:

  • the online training issue, which is purely technical, and can be addressed by SGD or other libraries than sklearn
  • concept drift, which is currently a hot topic and has no just works answers There are many possibilities, hypothesis and proofes of concepts, while there is no one, generaly accepted way of dealing with this phenomena, in fact many phd dissertations in ML are currenlly based on this issue.
Answered By: lejlot

While online algorithms for SVMs do exist, it has become important to specify if you want kernel or linear SVMs, as many efficient algorithms have been developed for the special case of linear SVMs.

For the linear case, if you use the SGD classifier in scikit-learn with the hinge loss and L2 regularization you will get an SVM that can be updated online/incrementall. You can combine this with feature transforms that approximate a kernel to get similar to an online kernel SVM.

One of my specifications is that it should continuously update to changing trends.

This is referred to as concept drift, and will not be handled well by a simple online SVM. Using the PassiveAggresive classifier will likely give you better results, as it’s learning rate does not decrease over time.

Assuming you get feedback while training / running, you can attempt to detect decreases in accuracy over time and begin training a new model when the accuracy starts to decrease (and switch to the new one when you believe that it has become more accurate). JSAT has 2 drift detection methods (see jsat.driftdetectors) that can be used to track accuracy and alert you when it has changed.

It also has more online linear and kernel methods.

(bias note: I’m the author of JSAT).

Answered By: Raff.Edward

Maybe it’s me being naive but I think it is worth mentioning how to actually update the sci-kit SGD classifier when you present your data incrementally:

clf = linear_model.SGDClassifier()
x1 = some_new_data
y1 = the_labels
clf.partial_fit(x1,y1)
x2 = some_newer_data
y2 = the_labels
clf.partial_fit(x2,y2)
Answered By: Jariani

If interested in online learning with concept drift then here is some previous work

  1. Learning under Concept Drift: an Overview
    https://arxiv.org/pdf/1010.4784.pdf

  2. The problem of concept drift: definitions and related work
    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.58.9085&rep=rep1&type=pdf

  3. A Survey on Concept Drift Adaptation
    http://www.win.tue.nl/~mpechen/publications/pubs/Gama_ACMCS_AdaptationCD_accepted.pdf

  4. MOA Concept Drift Active Learning Strategies for Streaming Data
    http://videolectures.net/wapa2011_bifet_moa/

  5. A Stream of Algorithms for Concept Drift
    http://people.cs.georgetown.edu/~maloof/pubs/maloof.heilbronn12.handout.pdf

  6. MINING DATA STREAMS WITH CONCEPT DRIFT
    http://www.cs.put.poznan.pl/dbrzezinski/publications/ConceptDrift.pdf

  7. Analyzing time series data with stream processing and machine learning
    http://www.ibmbigdatahub.com/blog/analyzing-time-series-data-stream-processing-and-machine-learning

Answered By: SemanticBeeng

SGD for batch learning tasks normally has a decreasing learning rate and goes over training set multiple times. So, for purely online learning, make sure learning_rate is set to ‘constant’ in sklearn.linear_model.SGDClassifier() and eta0= 0.1 or any desired value. Therefore the process is as follows:

clf= sklearn.linear_model.SGDClassifier(learning_rate = 'constant', eta0 = 0.1, shuffle = False, n_iter = 1)
# get x1, y1 as a new instance
clf.partial_fit(x1, y1)
# get x2, y2
# update accuracy if needed
clf.partial_fit(x2, y2)
Answered By: Alaleh Rz

A way to scale SVM could be split your large dataset into batches that can be safely consumed by an SVM algorithm, then find support vectors for each batch separately, and then build a resulting SVM model on a dataset consisting of all the support vectors found in all the batches.

Updating to trends could be achieved by maintaining a time window each time you run your training pipeline. For example, if you do your training once a day and there is enough information in a month’s historical data, create your traning dataset from the historical data obtained in the recent 30 days.

Answered By: Sergey Zakharov