Which seeds have to be set where to realize 100% reproducibility of training results in tensorflow?
Question:
In a general tensorflow setup like
model = construct_model()
with tf.Session() as sess:
train_model(sess)
Where construct_model()
contains the model definition including random initialization of weights (tf.truncated_normal
) and train_model(sess)
executes the training of the model –
Which seeds do I have to set where to ensure 100% reproducibility between repeated runs of the code snippet above? The documentation for tf.random.set_random_seed
may be concise, but left me a bit confused. I tried:
tf.set_random_seed(1234)
model = construct_model()
with tf.Session() as sess:
train_model(sess)
But got different results each time.
Answers:
One possible reason is that when constructing the model, there are some code using numpy.random module. So maybe you can try to set the seed for numpy, too.
The best solution which works as of today with GPU is to install tensorflow-determinism with the following:
pip install tensorflow-determinism
Then include the following code to your code
import tensorflow as tf
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
What has worked for me is following this answer with a few modifications:
import tensorflow as tf
import numpy as np
import random
# Setting seed value
# from https://stackoverflow.com/a/52897216
# generated randomly by running `random.randint(0, 100)` once
SEED = 75
# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
os.environ['PYTHONHASHSEED'] = str(SEED)
# 2. Set the `python` built-in pseudo-random generator at a fixed value
random.seed(SEED)
# 3. Set the `numpy` pseudo-random generator at a fixed value
np.random.seed(SEED)
# 4. Set the `tensorflow` pseudo-random generator at a fixed value
tf.random.set_seed(SEED)
I was not able to figure out how to set the session seed (step 5), but it didn’t seem like it was necessary.
I am running Google Colab Pro on a high-RAM TPU, and my training results (the graph of the loss function) have been exactly the same three times in a row with this method.
SEED = 42
import os
import random
os.environ["TF_DETERMINISTIC_OPS"] = "1"
keras.utils.set_random_seed(SEED)
os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)
In a general tensorflow setup like
model = construct_model()
with tf.Session() as sess:
train_model(sess)
Where construct_model()
contains the model definition including random initialization of weights (tf.truncated_normal
) and train_model(sess)
executes the training of the model –
Which seeds do I have to set where to ensure 100% reproducibility between repeated runs of the code snippet above? The documentation for tf.random.set_random_seed
may be concise, but left me a bit confused. I tried:
tf.set_random_seed(1234)
model = construct_model()
with tf.Session() as sess:
train_model(sess)
But got different results each time.
One possible reason is that when constructing the model, there are some code using numpy.random module. So maybe you can try to set the seed for numpy, too.
The best solution which works as of today with GPU is to install tensorflow-determinism with the following:
pip install tensorflow-determinism
Then include the following code to your code
import tensorflow as tf
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
What has worked for me is following this answer with a few modifications:
import tensorflow as tf
import numpy as np
import random
# Setting seed value
# from https://stackoverflow.com/a/52897216
# generated randomly by running `random.randint(0, 100)` once
SEED = 75
# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
os.environ['PYTHONHASHSEED'] = str(SEED)
# 2. Set the `python` built-in pseudo-random generator at a fixed value
random.seed(SEED)
# 3. Set the `numpy` pseudo-random generator at a fixed value
np.random.seed(SEED)
# 4. Set the `tensorflow` pseudo-random generator at a fixed value
tf.random.set_seed(SEED)
I was not able to figure out how to set the session seed (step 5), but it didn’t seem like it was necessary.
I am running Google Colab Pro on a high-RAM TPU, and my training results (the graph of the loss function) have been exactly the same three times in a row with this method.
SEED = 42
import os
import random
os.environ["TF_DETERMINISTIC_OPS"] = "1"
keras.utils.set_random_seed(SEED)
os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)