Tensorflow: How to train for a length of time instead of epochs?
Question:
Prior research:
Most relevant tensorflow article
How can I calculate the time spent for overall training a model in Tensorflow (for all epochs)?
Show Estimated remaining time to train a model Tensorflow with large epochs
Code:
y = to_categorical(self.ydata, num_classes=self.vocab_size)
model = Sequential()
model.add(Embedding(self.vocab_size, 10, input_length=1))
model.add(LSTM(1000, return_sequences=True))
model.add(LSTM(1000))
model.add(Dense(1000, activation="relu"))
model.add(Dense(self.vocab_size, activation="softmax"))
keras.utils.plot_model(model, show_layer_names=True)
checkpoint = ModelCheckpoint(modelFilePath, monitor='loss', verbose=1,save_best_only=True, mode='auto')
reduce = ReduceLROnPlateau(monitor='loss', factor=0.2,patience=3, min_lr=0.0001, verbose=1)
tensorboard_Visualization = TensorBoard(log_dir=logdirPath)
model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.001))
history = model.fit(self.Xdata, y, epochs=epochs, batch_size=64, callbacks=[checkpoint, reduce, tensorboard_Visualization]).history
Inspiration from:
- https://www.analyticsvidhya.com/blog/2021/08/predict-the-next-word-of-your-text-using-long-short-term-memory-lstm/
- https://towardsdatascience.com/building-a-next-word-predictor-in-tensorflow-e7e681d4f03f
This code takes a list of one-word "questions" and "answers" to train. Impressive background knowledge if you guessed the model’s goals before reading this. Anyways, this code works. I’m looking only to enhance it at this point.
How can I train a model for a set amount of time? The time an epoch takes varies based on what text I feed this AI. It changes a lot, generally around 10 seconds to 4 minutes. I could use that to approximate epochs from time, but if another way exists, I would appreciate a more concrete idea from TensorFlow’s resources.
I really want a usable answer. Please add some code to your explanation, especially some useful docs would be a plus. I hope you like the question and upvote it!
🙂
Answers:
If it’s imperative to define a timeout the way you formulated the problem, then take a look at this answer
However, accuracy will greatly change based on the text you’ll feed it. From extremely poor, to overfitted. So you’d end up spending more time just to verify. A better fit for your problem would be a custom EarlyStopping.
from tensorflow.keras.callbacks import EarlyStopping
custom_early_stopping = EarlyStopping(
monitor='val_accuracy',
patience=8,
min_delta=0.001,
mode='max'
)
history = model.fit(
# rest of parameters
callbacks=[custom_early_stopping ] # and rest of callbacks
)
In this example you set validation accuracy as performance monitor to determine when to stop the training. I don’t think you’d have any use for a low accuracy trained model, or to continue after you hit it just because you set a time.
patience=8
means the training is terminated as soon as 8 epochs with no improvement. min_delta=0.001
means the validation accuracy has to improve by at least 0.001 for it to count as an improvement. mode='max'
means it will stop when the quantity monitored has stopped increasing.
Prior research:
Most relevant tensorflow article
How can I calculate the time spent for overall training a model in Tensorflow (for all epochs)?
Show Estimated remaining time to train a model Tensorflow with large epochs
Code:
y = to_categorical(self.ydata, num_classes=self.vocab_size)
model = Sequential()
model.add(Embedding(self.vocab_size, 10, input_length=1))
model.add(LSTM(1000, return_sequences=True))
model.add(LSTM(1000))
model.add(Dense(1000, activation="relu"))
model.add(Dense(self.vocab_size, activation="softmax"))
keras.utils.plot_model(model, show_layer_names=True)
checkpoint = ModelCheckpoint(modelFilePath, monitor='loss', verbose=1,save_best_only=True, mode='auto')
reduce = ReduceLROnPlateau(monitor='loss', factor=0.2,patience=3, min_lr=0.0001, verbose=1)
tensorboard_Visualization = TensorBoard(log_dir=logdirPath)
model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.001))
history = model.fit(self.Xdata, y, epochs=epochs, batch_size=64, callbacks=[checkpoint, reduce, tensorboard_Visualization]).history
Inspiration from:
- https://www.analyticsvidhya.com/blog/2021/08/predict-the-next-word-of-your-text-using-long-short-term-memory-lstm/
- https://towardsdatascience.com/building-a-next-word-predictor-in-tensorflow-e7e681d4f03f
This code takes a list of one-word "questions" and "answers" to train. Impressive background knowledge if you guessed the model’s goals before reading this. Anyways, this code works. I’m looking only to enhance it at this point.
How can I train a model for a set amount of time? The time an epoch takes varies based on what text I feed this AI. It changes a lot, generally around 10 seconds to 4 minutes. I could use that to approximate epochs from time, but if another way exists, I would appreciate a more concrete idea from TensorFlow’s resources.
I really want a usable answer. Please add some code to your explanation, especially some useful docs would be a plus. I hope you like the question and upvote it!
🙂
If it’s imperative to define a timeout the way you formulated the problem, then take a look at this answer
However, accuracy will greatly change based on the text you’ll feed it. From extremely poor, to overfitted. So you’d end up spending more time just to verify. A better fit for your problem would be a custom EarlyStopping.
from tensorflow.keras.callbacks import EarlyStopping
custom_early_stopping = EarlyStopping(
monitor='val_accuracy',
patience=8,
min_delta=0.001,
mode='max'
)
history = model.fit(
# rest of parameters
callbacks=[custom_early_stopping ] # and rest of callbacks
)
In this example you set validation accuracy as performance monitor to determine when to stop the training. I don’t think you’d have any use for a low accuracy trained model, or to continue after you hit it just because you set a time.
patience=8
means the training is terminated as soon as 8 epochs with no improvement. min_delta=0.001
means the validation accuracy has to improve by at least 0.001 for it to count as an improvement. mode='max'
means it will stop when the quantity monitored has stopped increasing.