Why can't I get reproducible results in Keras even though I set the random seeds?

Question:

I’m training a MobileNet architecture on dummy data with Keras, on Mac OSX. I set both nump.random and tensorflow.set_random_seed, but for some reasons I can’t get reproducible results: each time I rerun the code, I get different results. Why? This is not due to the GPU, because I’m running on a MacBook Pro 2017 which has a Radeon graphics card, thus Tensorflow doesn’t take advantage of it. The code is run with

python Keras_test.py

so it’s not a problem of state (I’m not using Jupyter or IPython: the environment should be reset each time I run the code).

EDIT: I changed my code by moving the setting of all seeds before importing Keras. The results are still not deterministic, however the variance in results is much smaller than it was before. This is very bizarre.

The current model is very small (as far as deep neural network go) without being trivial, it doesn’t need a GPU to run and it trains in a few minutes on a modern laptop, so repeating my experiments is within anyone’s reach. I invite you to do it: I’d be very interested in learning about the level of variation from a system to another.

import numpy as np
# random seeds must be set before importing keras & tensorflow
my_seed = 512
np.random.seed(my_seed)
import random 
random.seed(my_seed)
import tensorflow as tf
tf.set_random_seed(my_seed)

# now we can import keras
import keras.utils
from keras.applications import MobileNet
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
import os

height = 224
width = 224
channels = 3
epochs = 10
num_classes = 10



# Generate dummy data
batch_size = 32  
n_train = 256
n_test = 64
x_train = np.random.random((n_train, height, width, channels))
y_train = keras.utils.to_categorical(np.random.randint(num_classes, size=(n_train, 1)), num_classes=num_classes)
x_test = np.random.random((n_test, height, width, channels))
y_test = keras.utils.to_categorical(np.random.randint(num_classes, size=(n_test, 1)), num_classes=num_classes)
# Get input shape
input_shape = x_train.shape[1:]

# Instantiate model 
model = MobileNet(weights=None,
                  input_shape=input_shape,
                  classes=num_classes)

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
# Viewing Model Configuration
model.summary()

# Model file name
filepath = 'model_epoch_{epoch:02d}_loss_{loss:0.2f}_val_{val_loss:.2f}.hdf5'

# Define save_best_only checkpointer
checkpointer = ModelCheckpoint(filepath=filepath,
                             monitor='val_acc',
                             verbose=1,
                             save_best_only=True)

# Let's fit!
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test),
          callbacks=[checkpointer])

As always, here are my Python, Keras & Tensorflow versions:

python -c 'import keras; import tensorflow; import sys; print(sys.version, 'keras.__version__', 'tensorflow.__version__')'
/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
('2.7.15 |Anaconda, Inc.| (default, May  1 2018, 18:37:05) n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]', '2.1.6', '1.8.0')

Here are some results obtained running this code multiple times: as you can see, the code saves the best model (best validation accuracy) out of 10 epochs with a descriptive filename, so comparing filenames across different runs allows to judge the variability in results.

model_epoch_01_loss_2.39_val_3.28.hdf5
model_epoch_01_loss_2.39_val_3.54.hdf5
model_epoch_01_loss_2.40_val_3.47.hdf5
model_epoch_01_loss_2.41_val_3.08.hdf5
Asked By: DeltaIV

||

Answers:

You can find the answer at Keras docs: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development.

In short, to be absolutely sure that you will get reproducible results with your python script on one computer’s/laptop’s CPU then you will have to do the following:

  1. Set PYTHONHASHSEED environment variable at a fixed value
  2. Set python built-in pseudo-random generator at a fixed value
  3. Set numpy pseudo-random generator at a fixed value
  4. Set tensorflow pseudo-random generator at a fixed value
  5. Configure a new global tensorflow session

Following the Keras link at the top, the source code I am using is the following:

# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value)
# for later versions: 
# tf.compat.v1.set_random_seed(seed_value)

# 5. Configure a new global `tensorflow` session
from keras import backend as K
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
# for later versions:
# session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
# sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)
# tf.compat.v1.keras.backend.set_session(sess)

It is needless to say that you do not have to to specify any seed or random_state at the numpy, scikit-learn or tensorflow/keras functions that you are using in your python script exactly because with the source code above we set globally their pseudo-random generators at a fixed value.

Answered By: Outcast

The key point of making result reproducible is to disable GPU. See my answer at another question (link https://stackoverflow.com/a/57121117/9501391) which has been accepted.

Answered By: guorui
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.