Why accuracy manually calculated by model.predict() is different from model.evaluate()'s accuracy

Question:

The accuracy calculated using the predicted labels and the true labels is particularly low, is there something I’ve written wrong?

Asked By: Roger.H

||

Answers:

Because your data will be randomly shuffled two times.

  1. The first one is when you call modified.predict(test_data). The test_data iterator will randomly shuffle the data.

  2. When you manually iterate over your dataset:
    for images, labels in test_data
    The data will also be randomly shuffled. So the labels order will different from the one you got with model.predict().

It is a default behavior of the dataset created with image_dataset_from_directory function.
You can check the documentation and default values of different arguments here:

https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory

If you set the shuffle=False:

test_data = tf.keras.utils.image_dataset_from_directory(
    data_dir + '/test',
    image_size=(32, 32),
    batch_size=train_batch_size,
    shuffle=False,
)

the manually calculated results and model.predict()ed ones will be identical (at least in the case of accuracy metric because evaluate aggregate the metrics over batches that could lead to the differences in the case of some other metrics).

Answered By: u1234x1234