How can I get the history of the different fits when using cross vaidation over a KerasRegressor?

Question:

I have a regression problem and I am using a keras fully connected layer to model my problem. I am using cross_val_score and my question is: how can I extract the model and the history of each train/validation combination the cross_val_score does?

Assuming this example:

from sklearn import datasets
from sklearn.model_selection import cross_val_score, KFold
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
seed = 1

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]

def baseline_model():
    model = Sequential()
    model.add(Dense(10, input_dim=10, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=100, verbose=False)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, y, cv=kfold)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))

My understanding is that I only get the overall mse over each fold, so to say.
But I want to compare the train to validation mse over the epochs of the model for each fold, i.e. for 10 in this case.

When not using kfold, but simple train/validation split, then one can do:

hist = model.fit(X_tr, y_tr, validation_data=val_data,
                  epochs=100, batch_size=100,
                  verbose=1)

plt.plot(history.history['loss'])
plt.plot(history.history['loss'])

This would return a plot representing the evolution of the mse w.r.t. to the epochs for the train and validation datasets, allowing to spot over/underfitting.

How to do this for each fold when using cross validation?

Asked By: jotNewie

||

Answers:

You can go for a “manual” CV procedure, and plot the loss (or any other available metric you might want to use) for each fold, i.e. something like this:

from sklearn.metrics import mean_squared_error
cv_mse = []

for train_index, val_index in kfold.split(X):
    history = estimator.fit(X[train_index], y[train_index])
    pred = estimator.predict(X[val_index])
    err = mean_squared_error(y[val_index], pred)
    cv_mse.append(err)
    plt.plot(history.history['loss'])

In which case, the cv_mse list will contain the final MSE for each fold, and you also get the respective plots for its evolution per epoch for each fold.

Answered By: desertnaut