High accuracy during training and validation, low accuracy during prediction with the same dataset

Question:

So I’m trying to train Keras model. There is high accuracy (I’m using f1score, but accuracy is also high) while training and validating. But when I’m trying to predict some dataset I’m getting lower accuracy. Even if I predict training set. So I guess it’s not about overfitting problem. What then is the problem?

import matplotlib.pyplot as plt

skf = StratifiedKFold(n_splits=5)
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    X_train,x_val,y_train,y_val = train_test_split(X_train, y_train, test_size=0.5,stratify = y_train)
    y_train = encode(y_train)
    y_val = encode(y_val)
    
    model = Sequential()
    model.add(Dense(50,input_dim=X_train.shape[1],activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(25,activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(10,activation='tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(2, activation='softmax'))   
    
    opt = Adam(learning_rate=0.001)
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['acc', ta.utils.metrics.f1score])  
    history = model.fit(X_train, y_train, 
                        validation_data=(x_val, y_val),
                        epochs=5000,
                        verbose=0)
    
    plt.plot(history.history['f1score'])
    plt.plot(history.history['val_f1score'])
    plt.title('model accuracy')
    plt.ylabel('f1score')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    break 

The result is here. As you can see results high at training and validation set.

And code for predict:

from sklearn.metrics import f1_score

y_pred = model.predict(x_train)
y_pred = decode(y_pred)
y_train_t = decode(y_train)
print(f1_score(y_train_t, y_pred))

The result is 0.64, that is less than expected 0.9.

My decode and encode:

def encode(y):
    Y=np.zeros((y.shape[0],2))
    for i in range(len(y)):
        if y[i]==1:
            Y[i][1]=1
        else :
            Y[i][0]=1
    return Y

def decode(y):
    Y=np.zeros((y.shape[0]))
    for i in range(len(y)):
        if np.argmax(y[i])==1:
            Y[i]=1
        else :
            Y[i]=0
    return Y
Asked By: Leerion

||

Answers:

I think that you should change the binary_crossentropy to categorical_crossentropy since you use one-hot encoding.

Answered By: abysslover

Since you use a last layer of

model.add(Dense(2, activation='softmax')

you should not use loss='binary_crossentropy' in model.compile(), but loss='categorical_crossentropy' instead.

Due to this mistake, the results shown during model fitting are probably wrong – the results returned by sklearn’s f1_score are the real ones.

Irrelevant to your question (as I guess the follow-up one will be how to improve it?), we practically never use activation='tanh' for the hidden layers (try relu instead). Also, dropout should not be used by default (especially with such a high value of 0.5); comment-out all dropout layers and only add them back if your model overfits (using dropout when it is not needed is known to hurt performance).

Answered By: desertnaut

Somehow, the combination of image generator and the predict_generator() function or the predict() function of Keras’ model does not work as expected.
Rather than using image generator to do prediction, I’d rather loop through all test images one-by-one and get the prediction for each image in each iteration. I am using Plaid-ML Keras as my backend and to get prediction I am using the following code.

import os
from PIL import Image
import keras
import numpy

###
# I am not including code to load models or train model
###

print("Prediction result:")
dir = "/path/to/test/images"
files = os.listdir(dir)
correct = 0
total = 0
#dictionary to label all traffic signs class.
classes = {
    0:'This is Cat',
    1:'This is Dog',
}
for file_name in files:
    total += 1
    image = Image.open(dir + "/" + file_name).convert('RGB')
    image = image.resize((100,100))
    image = numpy.expand_dims(image, axis=0)
    image = numpy.array(image)
    image = image/255
    pred = model.predict_classes([image])[0]
    sign = classes[pred]
    if ("cat" in file_name) and ("cat" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
    elif ("dog" in file_name) and ("dog" in sign):
        print(correct,". ", file_name, sign)
        correct+=1
print("accuracy: ", (correct/total))
Answered By: Anis Cherid