How to get other metrics in Tensorflow 2.0 (not only accuracy)?

Question:

I’m new in the world of Tensorflow and I’m working on the simple example of mnist dataset classification. I would like to know how can I obtain other metrics (e.g precision, recall etc) in addition to accuracy and loss (and possibly to show them). Here’s my code:

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import TensorBoard
import os 

#load mnist dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

#create and compile the model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), 
  tf.keras.layers.Dense(128, activation='relu'), 
  tf.keras.layers.Dropout(0.2), 
  tf.keras.layers.Dense(10, activation='softmax') 
])
model.summary()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

#model checkpoint (only if there is an improvement)

checkpoint_path = "logs/weights-improvement-{epoch:02d}-{accuracy:.2f}.hdf5"

cp_callback = ModelCheckpoint(checkpoint_path, monitor='accuracy',save_best_only=True,verbose=1, mode='max')

#Tensorboard
NAME = "tensorboard_{}".format(int(time.time())) #name of the model with timestamp
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME))

#train the model
model.fit(x_train, y_train, callbacks = [cp_callback, tensorboard], epochs=5)

#evaluate the model
model.evaluate(x_test,  y_test, verbose=2)

Since I get only accuracy and loss, how can i get other metrics?
Thank you in advance, I’m sorry if it is a simple question or If was already answered somewhere.

Asked By: Diane.95

||

Answers:

There is a list of available metrics in the Keras documentation. It includes recall, precision, etc.

For instance, recall:

model.compile('adam', loss='binary_crossentropy', 
    metrics=[tf.keras.metrics.Recall()])
Answered By: Nicolas Gervais

Starting from TensorFlow 2.X, precision and recall are both available as built-in metrics.

Therefore, you do not need to implement them by hand. In addition to this, they were removed before in Keras 2.X versions because they were misleading — as they were being computed in a batch-wise manner, the global(true) values of precision and recall would be actually different.

You can have a look here : https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Recall

Now they have a built-in accumulator, which ensures the correct calculation of those metrics.

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy',tf.keras.metrics.Precision(),tf.keras.metrics.Recall()])
Answered By: Timbus Calin

For a list of supported metrics, see:

tf.keras Metrics

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy',tf.keras.metrics.Precision(),tf.keras.metrics.Recall()])
Answered By: Ram-msft

I could not get Timbus’ answer to work and I found a very interesting explanation here.

It says:
The meaning of 'accuracy' depends on the loss function. The one that corresponds to sparse_categorical_crossentropy is tf.keras.metrics.SparseCategoricalAccuracy(), not tf.metrics.Accuracy().
Which makes a lot of sense.

So what metrics you can use depend on the loss you chose. E.g. using the metric ‘TruePositives’ won’t work in the case of SparseCategoricalAccuracy, because that loss means you’re working with more than 1 class, which in turn means True Positives cannot be defined because it is only used in binary classification problems.

A loss like tf.keras.metrics.CategoricalCrossentropy() will work because it is designed with multiple classes in mind! Example:

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import TensorBoard
import time
import os 

#load mnist dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

#create and compile the model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)), 
  tf.keras.layers.Dense(128, activation='relu'), 
  tf.keras.layers.Dropout(0.2), 
  tf.keras.layers.Dense(10, activation='softmax') 
])
model.summary()

# This will work because it makes sense
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy(),
                       tf.keras.metrics.CategoricalCrossentropy()])

# This will not work because it isn't designed for the multiclass classification problem
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=[tf.keras.metrics.SparseCategoricalAccuracy(),
                       tf.keras.metrics.TruePositives()])

#model checkpoint (only if there is an improvement)

checkpoint_path = "logs/weights-improvement-{epoch:02d}-{accuracy:.2f}.hdf5"

cp_callback = ModelCheckpoint(checkpoint_path,
                              monitor='accuracy',
                              save_best_only=True,
                              verbose=1,
                              mode='max')

#Tensorboard
NAME = "tensorboard_{}".format(int(time.time())) # name of the model with timestamp
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME))

#train the model
model.fit(x_train, y_train, epochs=5)

#evaluate the model
model.evaluate(x_test,  y_test, verbose=2)

In my case the other 2 answers gave me shape mismatches.

Answered By: Victor Sonck

I am adding another answer because this is the cleanest way in order to compute these metrics correctly on your test set (as of 22nd of March 2020).

The first thing you need to do is to create a custom callback, in which you send your test data:

import tensorflow as tf
from tensorflow.keras.callbacks import Callback
from sklearn.metrics import classification_report 

class MetricsCallback(Callback):
    def __init__(self, test_data, y_true):
        # Should be the label encoding of your classes
        self.y_true = y_true
        self.test_data = test_data
        
    def on_epoch_end(self, epoch, logs=None):
        # Here we get the probabilities
        y_pred = self.model.predict(self.test_data))
        # Here we get the actual classes
        y_pred = tf.argmax(y_pred,axis=1)
        # Actual dictionary
        report_dictionary = classification_report(self.y_true, y_pred, output_dict = True)
        # Only printing the report
        print(classification_report(self.y_true,y_pred,output_dict=False)              
           

In your main, where you load your dataset and add the callbacks:

metrics_callback = MetricsCallback(test_data = my_test_data, y_true = my_y_true)
...
...
#train the model
model.fit(x_train, y_train, callbacks = [cp_callback, metrics_callback,tensorboard], epochs=5)

         
Answered By: Timbus Calin