scikit-learn-pipeline

Get features names from scikit pipelines

Get features names from scikit pipelines Question: I am working on ML regression problem where I defined a pipeline like below based on a tutorial online. My code looks like below pipe1 = Pipeline([(‘poly’, PolynomialFeatures()), (‘fit’, linear_model.LinearRegression())]) pipe2 = Pipeline([(‘poly’, PolynomialFeatures()), (‘fit’, linear_model.Lasso())]) pipe3 = Pipeline([(‘poly’, PolynomialFeatures()), (‘fit’, linear_model.Ridge())]) pipe4 = Pipeline([(‘poly’, PolynomialFeatures()), (‘fit’, linear_model.TweedieRegressor())]) …

Total answers: 1

sklearn pipeline and grid search

sklearn pipeline and grid search Question: from sklearn.linear_model import LogisticRegression pipe4 = Pipeline([(‘ss’, StandardScaler()), (‘clf’, knn)]) grid2 = GridSearchCV(pipe4, {‘clf’:[ knn, LogisticRegression()]}) grid2.fit(X_train, y_train) pd.DataFrame(grid2.cv_results_).T I made a knn classifier and logistic regression model and wanted to check which model is better through pipeline method. as you can see the code above I put the …

Total answers: 1

Unable to load pickled custom estimator sklearn pipeline

Unable to load pickled custom estimator sklearn pipeline Question: I have a sklearn pipeline that uses custom column transformer, estimator and different lambda functions. Because Pickle cannot serialize the lambda functions, I am using dill. Here is the custom estimator I have: class customOLS(BaseEstimator): def __init__(self, ols): self.estimator_ols = ols def fit(self, X, y): X …

Total answers: 1

return coefficients from Pipeline object in sklearn

return coefficients from Pipeline object in sklearn Question: I’ve fit a Pipeline object with RandomizedSearchCV pipe_sgd = Pipeline([(‘scl’, StandardScaler()), (‘clf’, SGDClassifier(n_jobs=-1))]) param_dist_sgd = {‘clf__loss’: [‘log’], ‘clf__penalty’: [None, ‘l1’, ‘l2’, ‘elasticnet’], ‘clf__alpha’: np.linspace(0.15, 0.35), ‘clf__n_iter’: [3, 5, 7]} sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, param_distributions=param_dist_sgd, cv=3, n_iter=30, n_jobs=-1) sgd_randomized_pipe.fit(X_train, y_train) I want to access the coef_ attribute …

Total answers: 3

Invalid parameter for sklearn estimator pipeline

Invalid parameter for sklearn estimator pipeline Question: I am implementing an example from the O’Reilly book “Introduction to Machine Learning with Python“, using Python 2.7 and sklearn 0.16. The code I am using: pipe = make_pipeline(TfidfVectorizer(), LogisticRegression()) param_grid = {“logisticregression_C”: [0.001, 0.01, 0.1, 1, 10, 100], “tfidfvectorizer_ngram_range”: [(1,1), (1,2), (1,3)]} grid = GridSearchCV(pipe, param_grid, cv=5) …

Total answers: 4

How to gridsearch over transform arguments within a pipeline in scikit-learn

How to gridsearch over transform arguments within a pipeline in scikit-learn Question: My goal is to use one model to select the most important variables and another model to use those variables to make predictions. In the example below I am using two instances of RandomForestClassifier, but the second model could be any other classifier. …

Total answers: 3

Is it possible to toggle a certain step in sklearn pipeline?

Is it possible to toggle a certain step in sklearn pipeline? Question: I wonder if we can set up an “optional” step in sklearn.pipeline. For example, for a classification problem, I may want to try an ExtraTreesClassifier with AND without a PCA transformation ahead of it. In practice, it might be a pipeline with an …

Total answers: 2