how regroup multiple fit calls on a single epoche with keras
Question:
I am training a model with kearas on Go of datas, at a point where my computer can’t handle the RAM needed. So I am trying to implement my training as 1 epoche is done with multiple model.fit calls, with somthing like :
for epoche in range(nbEpoches):
for index_df in range(len(list_of_dataFrames)):
dataFrame = load_dataFrame(list_of_dataFrames, index_df) # load in ram only this DF
X_train, Y_train, X_test, Y_test = calc_train_arrays(dataFrame)
model.fit(
X_train, Y_train,
validation_data=(X_test, Y_test),
# ... what I am asking
batch_size=batch_size,
)
and with X_train and X_test are numpy arrays of shape (many thousands, 35 to 200, 54+), so using multiple batches is mandatory (for the GPU’s VRAM), and dynamicly loading the dataFrames too (for the RAM), this is what force me to use multiple fit calls for the same epoche.
I am asking how to use the model.fit function in order to do it.
i also wondered if using a generator of array of shape (batch_size, 35+, 54+) and specifing steps_per_epoch could be an idea ?
i have first tryed to avoid the problem by just training on a single dataFrame of around 20k samples, but the model is having generalisation issue. I also tryed to just do one epoche per dataframe, but it seems like each dataframe was un-learning the others.
Answers:
You should use fit_generator
instead of fit
. It loads the examples as needed instead of loading them all at once. If you’re familiar at all with Python 2, it’s like the difference between xrange
and range
, range
creates a list and puts it in your ram whereas xrange
would create a generator, it’s much more memory efficient. range
now defaults to the xrange
behavior in Python 3.
https://faroit.com/keras-docs/1.2.0/models/model/
Also just a PSA, I didn’t know this when I first got interested in ML, but Keras is now Tensorflow, and Caffe is now Pytorch. Keras and Caffe may be considered older tools nowadays, and may not receive updates as frequently as Pytorch or Tensorflow. Personally I recommend Pytorch out of the two, since Tensorflow is owned by Google and Pytorch has a little more of an open-source spirit to it.
I guess you have 2 options.
-
You can try a custom data generator. Here is an tutorial (i think this may be a little difficult):
https://medium.com/analytics-vidhya/write-your-own-custom-data-generator-for-tensorflow-keras-1252b64e41c3
-
You can also define a custom training loop, here is a tutorial: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
I am not sure if this is what you want.
I am training a model with kearas on Go of datas, at a point where my computer can’t handle the RAM needed. So I am trying to implement my training as 1 epoche is done with multiple model.fit calls, with somthing like :
for epoche in range(nbEpoches):
for index_df in range(len(list_of_dataFrames)):
dataFrame = load_dataFrame(list_of_dataFrames, index_df) # load in ram only this DF
X_train, Y_train, X_test, Y_test = calc_train_arrays(dataFrame)
model.fit(
X_train, Y_train,
validation_data=(X_test, Y_test),
# ... what I am asking
batch_size=batch_size,
)
and with X_train and X_test are numpy arrays of shape (many thousands, 35 to 200, 54+), so using multiple batches is mandatory (for the GPU’s VRAM), and dynamicly loading the dataFrames too (for the RAM), this is what force me to use multiple fit calls for the same epoche.
I am asking how to use the model.fit function in order to do it.
i also wondered if using a generator of array of shape (batch_size, 35+, 54+) and specifing steps_per_epoch could be an idea ?
i have first tryed to avoid the problem by just training on a single dataFrame of around 20k samples, but the model is having generalisation issue. I also tryed to just do one epoche per dataframe, but it seems like each dataframe was un-learning the others.
You should use fit_generator
instead of fit
. It loads the examples as needed instead of loading them all at once. If you’re familiar at all with Python 2, it’s like the difference between xrange
and range
, range
creates a list and puts it in your ram whereas xrange
would create a generator, it’s much more memory efficient. range
now defaults to the xrange
behavior in Python 3.
https://faroit.com/keras-docs/1.2.0/models/model/
Also just a PSA, I didn’t know this when I first got interested in ML, but Keras is now Tensorflow, and Caffe is now Pytorch. Keras and Caffe may be considered older tools nowadays, and may not receive updates as frequently as Pytorch or Tensorflow. Personally I recommend Pytorch out of the two, since Tensorflow is owned by Google and Pytorch has a little more of an open-source spirit to it.
I guess you have 2 options.
-
You can try a custom data generator. Here is an tutorial (i think this may be a little difficult):
https://medium.com/analytics-vidhya/write-your-own-custom-data-generator-for-tensorflow-keras-1252b64e41c3 -
You can also define a custom training loop, here is a tutorial: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch
I am not sure if this is what you want.