Insert or delete a step in scikit-learn Pipeline

Question:

Is it possible to delete or insert a step in a sklearn.pipeline.Pipeline object?

I am trying to do a grid search with or without one step in the Pipeline object. And wondering whether I can insert or delete a step in the pipeline. I saw in the Pipeline source code, there is a self.steps object holding all the steps. We can get the steps by named_steps(). Before modifying it, I want to make sure, I do not cause unexpected effects.

Here is a example code:

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('svm', SVC())]
clf = Pipeline(estimators)
clf 

Is it possible that we do something like steps = clf.named_steps(), then insert or delete in this list? Does this cause undesired effect on the clf object?

Asked By: Bin

||

Answers:

Yes, that’s possible, but you must fulfill same requirements which Pipeline requires at initialization, i.e. you cannot insert predictor in any step except last, you should call fit after you update Pipeline.steps, because after such update all steps (maybe they were learned in previous fit calls) will be invalidated, also last step of Pipeline should always implement fit method, all previous steps should implement fit_transform.

So yes, it will work in current codebase, but i think it’s not a good solution for your task, it makes your code more dependent on current implementation of Pipeline, i think it’s more convenient to create new Pipeline with modified steps, because Pipeline will at least validate all your steps in initialization, also creating new Pipeline will not significantly differ in terms of speed from modifying steps of existing pipeline, but as i’ve just said – creation of new Pipeline after each modification of steps is safer in case when someone will significantly change implementation of Pipeline.

Answered By: Bad Name

Based on rudimentary testing you can safely remove a step from a scikit-learn pipeline just like you would any list item, with a simple

clf_pipeline.steps.pop(n)

where n is the position of the individual estimator you are trying to remove.

Answered By: labelmaker

I see that everyone mentioned only the delete step. In case you want to also insert a step in the pipeline:

pipe.steps.append(['step name',transformer()])

pipe.steps works in the same way as lists do, so you can also insert an item into a specific location:

pipe.steps.insert(1,['estimator',transformer()]) #insert as second step
Answered By: HonzaB

Just chiming in because I feel like the other answers answered the question of adding steps to a pipeline really well, but didn’t really cover how to delete a step from a pipeline.

Watch out with my approach though. Slicing lists in this instance is a bit weird.

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.preprocessing import PolynomialFeatures

estimators = [('reduce_dim', PCA()), ('poly', PolynomialFeatures()), ('svm', SVC())]
clf = Pipeline(estimators)

If you want to create a pipeline with just steps PCA/Polynomial you can just slice the list step by indexes and pass it to Pipeline

clf1 = Pipeline(clf.steps[0:2])

Want to just use steps 2/3?
Watch out these slices don’t always make the most amount of sense

clf2 = Pipeline(clf.steps[1:3])

Want to just use steps 1/3?
I can’t seem to do using this approach

clf3 = Pipeline(clf.steps[0] + clf.steps[2]) # errors
Answered By: plumbus_bouquet