sklearn GridSearchCV with Pipeline

Question:

I’m new to sklearn‘s Pipeline and GridSearchCV features. I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code:

pca = RandomizedPCA(1000, whiten=True)
rgn = Ridge()

pca_ridge = Pipeline([('pca', pca),
                      ('ridge', rgn)])

parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)}

grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error')
grid_search.fit(train_x, train_y[:, 1:])

I know about the RidgeCV function but I want to try out Pipeline and GridSearch CV.

I want the grid search CV to report RMSE error, but this doesn’t seem supported in sklearn so I’m making do with MSE. However, the scores it resports are negative:

In [41]: grid_search.grid_scores_
Out[41]: 
[mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05},
 mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791},
 mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}]

Obviously this isn’t possible for mean squared error – what am I doing wrong here?

Asked By: mchangun

||

Answers:

Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that GridSearchCV, by convention, always tries to maximize its score so loss functions like MSE have to be negated.

Answered By: Fred Foo

If you want to get RMSE as a metric you can write your own callable/function which will take Y_pred and Y_org and calculate the RMSE.

ref

Answered By: mlengg

Suppose, I have stored results of negative MSE and negative MAE obtained from GridSearchCV in lists named as model_nmse and model_nmae respectively .

So i would simply multiply it with (-1) , to get desired MSE and MAE scores.

model_mse = list(np.multiply(model_nmse , -1))

model_mae = list(np.multiply(model_nmae , -1))
Answered By: Prateek sahu

An alternate way to create GridSearchCV is to use make_scorer and turn greater_is_better flag to False

So, if clf is your classifier, and parameters are your hyperparameter lists, you can use the make_scorer like this:

from sklearn.metrics import make_scorer
#define your own mse and set greater_is_better=False
mse = make_scorer(mean_squared_error,greater_is_better=False)

Now, same as below, you can call the GridSearch and pass your defined mse

grid_obj = GridSearchCV(clf, parameters, cv=5,scoring=mse,n_jobs = -1, verbose=True)
Answered By: Espanta

You can see the scoring in the documentation

enter image description here

Answered By: chaoyu feng
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.