Keras loss value significant jump

Question:

I am working on a simple neural network in Keras with Tensorflow. There is a significant jump in loss value from the last mini-batch of epoch L-1 to the first mini-batch of epoch L.
Loss versus No. of Iterations

I am aware that the loss should decrease with an increase in the number of iterations but a significant jump in loss after each epoch does looks strange. Here is the code snippet

tf.keras.initializers.he_uniform(seed=None)
initializer = tf.keras.initializers.he_uniform()

def my_loss(y_true, y_pred): 
   epsilon=1e-30 #epsilon is added to avoid inf/nan
   y_pred = K.cast(y_pred, K.floatx())
   y_true = K.cast(y_true, K.floatx())
   loss = y_true* K.log(y_pred+epsilon)  + (1-y_true)*K.log(1-y_pred+epsilon)
   loss = K.mean(loss, axis= -1) 
   loss = K.mean(loss)
   loss = -1*loss
   return loss

inputs = tf.keras.Input(shape=(140,))
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(1000,kernel_initializer=initializer)(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dense(100, kernel_initializer=initializer)(x)
outputs = tf.keras.activations.sigmoid(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)


opt = tf.keras.optimizers.Adam()
recall1 = tf.keras.metrics.Recall(top_k = 8)
c_entropy = tf.keras.losses.BinaryCrossentropy()

model.compile(loss=c_entropy, optimizer= opt , metrics = [recall1,my_loss], run_eagerly=True)

model.fit(X_train_test, Y_train_test, epochs=epochs, batch_size=batch, shuffle=True, verbose = 1)

When I search online, I found this article, which suggests that Keras calculates the moving average over the mini-batches. Also, I found somewhere that the array for calculating the moving average is reset after each epoch that’s why we obtain a very smooth curve within an epoch but a jump after the epoch.

In order to avoid the moving average, I implemented my own loss function, which should output the loss values of the mini-batch instead of the moving average over the batches. As each mini-batch is different from each other; therefore the corresponding loss must also be different from each other. Due to this reason, I was expecting an arbitrary loss value on each mini-batch through my implementation of the loss function. Instead, I obtain exactly the same values as the loss function by Keras.

I am unclear about:

  1. Is Keras calculating the moving average over the mini-batches, the array of which is reset after each epoch causing the jump. If not, then what is causing the jump behaviour in loss value.
  2. Is my implementation of loss for each mini-batch correct? If not, then how can I obtain the loss value of the mini-batch during the training.
Asked By: Doc Jazzy

||

Answers:

Keras in fact shows the moving average instead of the "raw" loss values. The moving average array is reset after each epoch that’s why we can see a huge jump after each epoch. In order to acquire the raw loss values, one should implement a callback as shown below:

class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        #initialize a list at the begining of training
        self.losses = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))

mycallback = LossHistory()

Then call it in model.fit

model.fit(X, Y, epochs=epochs, batch_size=batch, shuffle=True, verbose = 0, callbacks=[mycallback])
print(mycallback.losses)

I tested with the following configuration

Keras 2.3.1
Tensorflow 2.1.0
Python 3.7.9

For some reason, it didn’t work with the following configuration

Keras 2.4.3
Tensorflow 2.2.0
Python 3.8.5

To answer the second question, the implementation of the loss function my_loss is correct and the values obtained are pretty much close to the values generated by the built-in function.

tf.keras.losses.BinaryCrossentropy()
Answered By: Doc Jazzy

In TensorFlow version 2.2 and newer, the loss provided to on_train_batch_end is now the average loss of all batches up until the current batch within the given epoch. This is also the case for additional metrics, and applies to the built-in losses/metrics as well as any custom losses/metrics.

Fortunately, the loss for the current batch can be calculated from the average loss as follows:

from tensorflow.keras.callbacks import Callback

class CustomCallback(Callback):
    ''' This callback converts the average loss (default behavior in TF>=2.2)
        into the loss for only the current batch.
    '''
    def on_epoch_begin(self, epoch, logs={}):
        self.previous_loss_sum = 0

    def on_train_batch_end(self, batch, logs={}):
        # calculate loss of current batch:
        current_loss_sum =  (batch + 1) * logs['loss']
        current_loss = current_loss_sum - self.previous_loss_sum
        self.previous_loss_sum = current_loss_sum

        # use current_loss:
        # ...

This code can be added to any custom callback that needs the loss for the current batch instead of the average loss, including the LossHistory callback provided in Doc Jazzy’s answer.

Also, if you are using Tensorflow 1 or TensorFlow 2 version <= 2.1, then do not include this code in your callback, as in those versions the current loss is already provided, instead of the average loss.

Answered By: Leland Hepworth