tensorflow:Can save best model only with val_acc available, skipping

Question:

I have an issue with tf.callbacks.ModelChekpoint. As you can see in my log file, the warning comes always before the last iteration where the val_acc is calculated. Therefore, Modelcheckpoint never finds the val_acc

Epoch 1/30
1/8 [==>...........................] - ETA: 19s - loss: 1.4174 - accuracy: 0.3000
2/8 [======>.......................] - ETA: 8s - loss: 1.3363 - accuracy: 0.3500 
3/8 [==========>...................] - ETA: 4s - loss: 1.3994 - accuracy: 0.2667
4/8 [==============>...............] - ETA: 3s - loss: 1.3527 - accuracy: 0.3250
6/8 [=====================>........] - ETA: 1s - loss: 1.3042 - accuracy: 0.3333
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 4s 482ms/step - loss: 1.2846 - accuracy: 0.3375 - val_loss: 1.3512 - val_accuracy: 0.5000

Epoch 2/30
1/8 [==>...........................] - ETA: 0s - loss: 1.0098 - accuracy: 0.5000
3/8 [==========>...................] - ETA: 0s - loss: 0.8916 - accuracy: 0.5333
5/8 [=================>............] - ETA: 0s - loss: 0.9533 - accuracy: 0.5600
6/8 [=====================>........] - ETA: 0s - loss: 0.9523 - accuracy: 0.5667
7/8 [=========================>....] - ETA: 0s - loss: 0.9377 - accuracy: 0.5714
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 1s 98ms/step - loss: 0.9229 - accuracy: 0.5750 - val_loss: 1.2507 - val_accuracy: 0.5000

This is my code for training the CNN.

callbacks = [
        TensorBoard(log_dir=r'C:UsersredaDesktoplogs{}'.format(Name),
                    histogram_freq=1),
        ModelCheckpoint(filepath=r"C:UsersredaDesktopcheckpoints{}".format(Name), monitor='val_acc',
                        verbose=2, save_best_only=True, mode='max')]
history = model.fit_generator(
        train_data_gen, 
        steps_per_epoch=total_train // batch_size,
        epochs=epochs,
        validation_data=val_data_gen,
        validation_steps=total_val // batch_size,
        callbacks=callbacks)
Asked By: Reda El Hail

||

Answers:

I know how frustrating these things can be sometimes..but tensorflow requires that you explicitly write out the name of metric you are wanting to calculate

You will need to actually say ‘val_accuracy’

metric = 'val_accuracy'
ModelCheckpoint(filepath=r"C:Usersreda.elhailDesktopcheckpoints{}".format(Name), monitor=metric,
                    verbose=2, save_best_only=True, mode='max')]

Hope this helps =)

Answered By: Brian Mark Anderson

To add to the accepted answer as I just struggled with this. Not only do you have to use the full the metric name, it must match for your model.compile, ModelCheckpoint, and EarlyStopping. I had one set to accuracy and the other two set to val_accuracy and it did not work.

Answered By: BlueTurtle

I had the same issue as even after mentioning the metric=val_accuracy it did not work. So I just changed it to metric=val_acc and it worked.

Answered By: Vilas

Print the metrics after training for one epoch like below. This will print the metrics defined for your model.

hist = model.fit(...)
for key in hist.history:
print(key)

Now replace them in your metrics. It will work like charm.

This hack was given by the gentleman in the below link. Thanks to him!!
https://github.com/tensorflow/tensorflow/issues/33163#issuecomment-540451749

Answered By: Sachin Mohan

If you are using validation_steps or steps per epochs in model.fit() function. Remove that parameter. The validation losses and accuracy will start appearing. Just include a few parameters as possible:

model_history = model.fit(x=aug.flow(X_train, y_train, batch_size=16), epochs=EPOCHS,validation_data=[X_val, y_val], callbacks=[callbacks_list])
Answered By: NotAGenius

If you are using ModelCheckpoint and EarlyStopping then in that case both the "momitor" metric should be same like ‘accuracy’.

Also, EarlyStopping doesn’t support all metrics in some tensorflow versions so you have to choose an metrics that’s common in both and what best suits your model.

Answered By: rarenicks

I still had the issue even after changing the argument from monitor='val_acc' to monitor='val_accuracy'.

You can check this link from Keras and make sure you keep the arguments and the values you are passing as it is. I removed extra arguments I was passing and it worked for me!

Before

checkpoint = ModelCheckpoint("mnist-cnn-keras.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', save_freq=1)

After

checkpoint = ModelCheckpoint("./", monitor='val_accuracy', verbose=2, save_best_only=True, mode='max')
Answered By: ayan-cs

You have to write what the name appears when you run it. You are probably using a different metric instead of ‘accuracy’ in the metric section. BinaryAccuracy, SparseAccuracy, CategoricalAccuracy etc. For example, when you use BinaryAccuracy, ‘binary_accuracy’ is written instead of ‘accuracy’ in the run section. This is how you should write in the monitor section.

Answered By: Emrah Harmanci

monitor=’val_loss’ in both Checkpointing and Earlystopping callbacks worked for me.

Answered By: Nick

You may also find your model metrics having an incrementing number appended to them after the first run. E.g.

for key in history.history:
    print(key)
loss
accuracy
auc_4
precision_4
recall_4
true_positives_4
true_negatives_4
false_positives_4
false_negatives_4
val_loss
val_accuracy
val_auc_4

If that is the case, you can reset the session before each run so that the numbers aren’t appended.

for something in something_else:
    tf.keras.backend.clear_session()  # resets the session
    model = define_model(...)
    history = train_model(...)
Answered By: John Johnson