how to plot correctly loss curves for training and validation sets?

Question:

I want to plot loss curves for my training and validation sets the same way as Keras does, but using Scikit. I have chosen the concrete dataset which is a Regression problem, the dataset is available at:

http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/

So, I have converted the data to CSV and the first version of my program is the following:

Model 1

df=pd.read_csv("Concrete_Data.csv")
train,validate,test=np.split(df.sample(frac=1),[int(.8*len(df)),int(.90*len(df))])
Xtrain=train.drop(["ConcreteCompStrength"],axis="columns")
ytrain=train["ConcreteCompStrength"]
Xval=validate.drop(["ConcreteCompStrength"],axis="columns")
yval=validate["ConcreteCompStrength"]
mlp=MLPRegressor(activation="relu",max_iter=5000,solver="adam",random_state=2)
mlp.fit(Xtrain,ytrain)

plt.plot(mlp.loss_curve_,label="train")
mlp.fit(Xval,yval)                           #doubt
plt.plot(mlp.loss_curve_,label="validation") #doubt
plt.legend()

The resulting graph is the following:

enter image description here

In this model, I doubt if it’s the correct marked part because as long as I know one should leave apart the validation or testing set, so maybe the fit function is not correct there. The score that I got is 0.95.

Model 2

In this model I try to use the validation score as follows:

df=pd.read_csv("Concrete_Data.csv")
train,validate,test=np.split(df.sample(frac=1),[int(.8*len(df)),int(.90*len(df))])
Xtrain=train.drop(["ConcreteCompStrength"],axis="columns")
ytrain=train["ConcreteCompStrength"]
Xval=validate.drop(["ConcreteCompStrength"],axis="columns")
yval=validate["ConcreteCompStrength"]
mlp=MLPRegressor(activation="relu",max_iter=5000,solver="adam",random_state=2,early_stopping=True)
mlp.fit(Xtrain,ytrain)

plt.plot(mlp.loss_curve_,label="train")
plt.plot(mlp.validation_scores_,label="validation")   #line changed
plt.legend()

And for this model, I had to add the part of early stopping set to true and validation_scores_to be plotted, but the graph results are a little bit weird:

enter image description here

The score I get is 0.82, but I read that this occurs when the model finds it easier to predict the data in the validation set that in the train set. I believe that is because I am using the validation_scores_ part, but I was not able to find any online reference about this particular instruction.

How it will be the correct way to plot these loss curves for adjusting my hyperparameters in Scikit?

Update
I have programmed the module as it was advise like this:

mlp=MLPRegressor(activation="relu",max_iter=1,solver="adam",random_state=2,early_stopping=True)
training_mse = []
validation_mse = []
epochs = 5000
for epoch in range(1,epochs):
    mlp.fit(X_train, Y_train) 
    Y_pred = mlp.predict(X_train)
    curr_train_score = mean_squared_error(Y_train, Y_pred) # training performances
    Y_pred = mlp.predict(X_valid) 
    curr_valid_score = mean_squared_error(Y_valid, Y_pred) # validation performances
    training_mse.append(curr_train_score) # list of training perf to plot
    validation_mse.append(curr_valid_score) # list of valid perf to plot
plt.plot(training_mse,label="train")
plt.plot(validation_mse,label="validation")
plt.legend()

but the plot obtained are two flat lines:

enter image description here

It seems I am missing something here.

Asked By: Little

||

Answers:

You shouldn’t fit your model on the validation set. The validation set is usually used to decide what hyperparameters to use, not the parameters’ values.

The standard way to do training is to divide your dataset into three parts

  • training
  • validation
  • test

For example with a split of 80, 10, 10 %

Usually, you would select a neural network (how many layers, nodes, what activation functions) and then train -only- on the training set, check the result on the validation, and then on the test

I’ll show a pseudo algorithm to make it clear:

for model in my_networks:       # hyperparameters selection
    model.fit(X_train, Y_train) # parameters fitting
    model.predict(X_valid)      # no train, only check on performances

Save model performances on validation and pick the best model (the one with the best scores on the validation set) then check results on the testset:

model.predict(X_test) # this will be the estimated performance of your model

If your dataset is big enough, you could also use something like cross-validation.

Anyway, remember:

  • the parameters are the network weights
  • you fit the parameters with the training set
  • the hyperparameters are the ones that define the net architecture (layers, nodes, activation functions)
  • you select the best hyperparameters checking the result of your model on the validation set
  • after this selection (best parameters, best hyperparameters) you get the model performances testing the model on the test set

To obtain the same result of keras, you should understand that when you call the method fit() on the model with default arguments, the training will stop after a fixed amount of epochs (200), with your defined number of epochs (5000 in your case) or when you define a early_stopping.

max_iter: int, default=200

Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For
stochastic solvers (‘sgd’, ‘adam’), note that this determines the
number of epochs (how many times each data point will be used), not
the number of gradient steps.

Check your model definition and arguments on the scikit page

To obtain the same result of keras, you could fix the training epochs (eg. 1 step per training), check the result on validation, and then train again until you reach the desired number of epochs

for example, something like this (if your model uses mse):

epochs = 5000

mlp = MLPRegressor(activation="relu",
                   max_iter=1,
                   solver="adam",
                   random_state=2,
                   early_stopping=True)
training_mse = []
validation_mse = []
for epoch in epochs:
    mlp.fit(X_train, Y_train) 
    Y_pred = mlp.predict(X_train)
    curr_train_score = mean_squared_error(Y_train, Y_pred) # training performances
    Y_pred = mlp.predict(X_valid) 
    curr_valid_score = mean_squared_error(Y_valid, Y_pred) # validation performances
    training_mse.append(curr_train_score)                  # list of training perf to plot
    validation_mse.append(curr_valid_score)                # list of valid perf to plot
Answered By: Nikaido

I have the same problem: obtained two flat lines when using the module as it was advised, I solve the problem just adding warm_start=True to the MLPRegressor parameters, as explained in MLPRegressor- 1.17.9. More control with warm_start

mlp=MLPRegressor(activation="relu",max_iter=1,solver="adam",random_state=2,early_stopping=True, warm_star=True)

The plot obtained are now correct:
Train and validation loss curves

Answered By: Anelise Dick