Kears LeNet High Training & Validation accuracy but Low Testing accuracy

Question:

I am trying to train the mnist database using the LeNet Architecture.

I downloaded the mnist_png images from github (https://github.com/myleott/mnist_png) and it had over 50000 images. I am trying to build a LeNet model for the prediction of handwritten numbers using the LeNet Architecture which was written using keras

Code for generating images.

train_ds = tf.keras.utils.image_dataset_from_directory(
  'mnist_png/training/',
  validation_split = 0.2,
  subset = "training",
  seed = 123,
  image_size = (32, 32),
  batch_size = 100)

val_ds = tf.keras.utils.image_dataset_from_directory(
  'mnist_png/training/',
  validation_split = 0.2,
  subset = "validation",
  seed = 123,
  image_size = (32, 32),
  batch_size = 100)

test_ds = tf.keras.utils.image_dataset_from_directory(
  'mnist_png/testing/',
  seed = 123,
  image_size = (32, 32),
  batch_size = 1000)

Output


Found 40818 files belonging to 7 classes.
Using 32655 files for training.
Found 40818 files belonging to 7 classes.
Using 8163 files for validation.
Found 10000 files belonging to 10 classes.

Input shape = (32, 32, 3)

My model summary

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 28, 28, 6)         456       
                                                                 
 average_pooling2d (AverageP  (None, 14, 14, 6)        0         
 ooling2D)                                                       
                                                                 
 activation (Activation)     (None, 14, 14, 6)         0         
                                                                 
 conv2d_1 (Conv2D)           (None, 10, 10, 16)        2416      
                                                                 
 average_pooling2d_1 (Averag  (None, 5, 5, 16)         0         
 ePooling2D)                                                     
                                                                 
 activation_1 (Activation)   (None, 5, 5, 16)          0         
                                                                 
 conv2d_2 (Conv2D)           (None, 1, 1, 120)         48120     
                                                                 
 flatten (Flatten)           (None, 120)               0         
                                                                 
 dense (Dense)               (None, 84)                10164     
                                                                 
 dense_1 (Dense)             (None, 10)                850       
                                                                 
=================================================================
Total params: 62,006
Trainable params: 62,006
Non-trainable params: 0

Model compiled with this code

model.compile(optimizer='adam', loss=losses.sparse_categorical_crossentropy, metrics=['accuracy'])

I have trained it for 10 epochs and I get this output –

Epoch 1/10
327/327 [==============================] - 31s 79ms/step - loss: 0.9729 - accuracy: 0.6456 - val_loss: 0.3609 - val_accuracy: 0.8951
Epoch 2/10
327/327 [==============================] - 25s 77ms/step - loss: 0.3036 - accuracy: 0.9021 - val_loss: 0.2276 - val_accuracy: 0.9330
Epoch 3/10
327/327 [==============================] - 28s 85ms/step - loss: 0.2170 - accuracy: 0.9307 - val_loss: 0.1862 - val_accuracy: 0.9389
Epoch 4/10
327/327 [==============================] - 29s 89ms/step - loss: 0.1778 - accuracy: 0.9433 - val_loss: 0.1892 - val_accuracy: 0.9401
Epoch 5/10
327/327 [==============================] - 25s 76ms/step - loss: 0.1521 - accuracy: 0.9519 - val_loss: 0.1692 - val_accuracy: 0.9476
Epoch 6/10
327/327 [==============================] - 27s 83ms/step - loss: 0.1392 - accuracy: 0.9553 - val_loss: 0.1340 - val_accuracy: 0.9588
Epoch 7/10
327/327 [==============================] - 26s 79ms/step - loss: 0.1203 - accuracy: 0.9609 - val_loss: 0.1131 - val_accuracy: 0.9632
Epoch 8/10
327/327 [==============================] - 25s 76ms/step - loss: 0.1128 - accuracy: 0.9644 - val_loss: 0.1170 - val_accuracy: 0.9644
Epoch 9/10
327/327 [==============================] - 27s 81ms/step - loss: 0.1061 - accuracy: 0.9663 - val_loss: 0.1051 - val_accuracy: 0.9659
Epoch 10/10
327/327 [==============================] - 29s 89ms/step - loss: 0.0968 - accuracy: 0.9699 - val_loss: 0.0950 - val_accuracy: 0.9705

When I run model.evaluate(test), i get a high loss and and a low accuracy.

10/10 [==============================] - 4s 200ms/step - loss: 9.2694 - accuracy: 0.0656

Is there any reason for that?

Answers:

Nothing seems obviously wrong. In test_ds try setting shuffle=False. To get a clue try running model.evaluate on val_ds and see if it gives the correct result. Only other thing I can think of is that something is amiss with the test data. Take a look at a few of the images and see if their associated label is corect.

Answered By: Gerry P

Incomplete training dataset

It looks like you have an incomplete data set. As you can see in your output after you load the files, it says (quoting the output in your question):

Found 40818 files belonging to 7 classes.
Using 32655 files for training.
Found 40818 files belonging to 7 classes.
Using 8163 files for validation.
Found 10000 files belonging to 10 classes.

Note that the first two are training and validation datasets, and they only see 40818 files, adding up to 7 classes, while the last one which is testing sees all 10 classes. That means you’re only training with 7 classes, and your model has never seen the other 3 classes.

If I run the following code (these are separate cells in my Jupyter notebook, which you can paste into Colab to run easily), it finds all 10 classes:

%%bash

MNIST_PNG="mnist_png.tar.gz"
if ! [ -e "${MNIST_PNG}" ]; then
  curl -sO "https://raw.githubusercontent.com/myleott/mnist_png/master/${MNIST_PNG}"
fi

MNIST_DIR="mnist_png"
if ! [ -d "${MNIST_DIR}" ]; then
  tar zxf "${MNIST_PNG}"
fi
import tensorflow as tf

train_ds = tf.keras.utils.image_dataset_from_directory(
  'mnist_png/training/',
  validation_split = 0.2,
  subset = "training",
  seed = 123,
  image_size = (32, 32),
  batch_size = 100)

val_ds = tf.keras.utils.image_dataset_from_directory(
  'mnist_png/training/',
  validation_split = 0.2,
  subset = "validation",
  seed = 123,
  image_size = (32, 32),
  batch_size = 100)

test_ds = tf.keras.utils.image_dataset_from_directory(
  'mnist_png/testing/',
  seed = 123,
  image_size = (32, 32),
  batch_size = 1000)

Output:

Found 60000 files belonging to 10 classes.
Using 48000 files for training.
Found 60000 files belonging to 10 classes.
Using 12000 files for validation.
Found 10000 files belonging to 10 classes.

Thus, you should fix this first and ensure that you have the complete dataset, and you will likely get a much better result on the testing dataset then.

Grayscale vs. RGB images

I would also recommend specifying color_mode='grayscale' when you’re calling image_dataset_from_directory() as the dataset itself is greyscale (you can verify this using using either PIL or matplotlib libraries) and the default for image_dataset_from_directory() is to upscale each image to RGB (3 channels) which is how you end up with a 3-channel input, which is just duplicating the single greyscale channel 3 times.

Answered By: Misha Brukman