rmse cross validation using sklearn

Question:

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

cv = KFold(n_splits=10, random_state=1, shuffle=True)

scores = cross_val_score(regressor, X, y, scoring='neg_mean_absolute_error',
                         cv=cv, n_jobs=-1)
np.mean(np.abs(scores))

regressor is the fitted model, X is the independent features and y is the dependent feature. Is the code right? Also I’m confused can rmse be bigger than 100? I’m getting values such as 121 from some regression models. Is rmse used to tell you how good your model is in general or only to tell you how good your model is compared to other models?

rmse = 121
enter image description here

Asked By: FjkgB

||

Answers:

If you want RMSE, why are you using mean absolute error for scoring? Change it to this:

scores = cross_val_score(regressor, X, y, scoring = 'neg_mean_squared_error',
                         cv = cv, n_jobs = -1)

Since, RMSE is the square root of mean squared error, we have to do this:

np.mean(np.sqrt(np.abs(scores)))
Answered By: Adarsh Wase

The RMSE value can be calculated using sklearn.metrics as follows:

from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test, predictions)
rmse = math.sqrt(mse)
print('RMSE: %f' % rmse)

In terms of the interpretation, you need to compare RMSE to the mean of your test data to determine the model accuracy. Standard errors are a measure of how accurate the mean of a given sample is likely to be compared to the true population mean.

For instance, an RMSE of 5 compared to a mean of 100 is a good score, as the RMSE size is quite small relative to the mean.

On the other hand, an RMSE of 5 compared to a mean of 2 would not be a good result – the mean estimate is too wide compared to the test mean.

Answered By: Michael Grogan