how regroup multiple fit calls on a single epoche with keras


I am training a model with kearas on Go of datas, at a point where my computer can’t handle the RAM needed. So I am trying to implement my training as 1 epoche is done with multiple calls, with somthing like :

for epoche in range(nbEpoches):
    for index_df in range(len(list_of_dataFrames)):
        dataFrame = load_dataFrame(list_of_dataFrames, index_df) # load in ram only this DF
        X_train, Y_train, X_test, Y_test = calc_train_arrays(dataFrame)
            X_train, Y_train,
            validation_data=(X_test, Y_test),
            # ... what I am asking

and with X_train and X_test are numpy arrays of shape (many thousands, 35 to 200, 54+), so using multiple batches is mandatory (for the GPU’s VRAM), and dynamicly loading the dataFrames too (for the RAM), this is what force me to use multiple fit calls for the same epoche.
I am asking how to use the function in order to do it.
i also wondered if using a generator of array of shape (batch_size, 35+, 54+) and specifing steps_per_epoch could be an idea ?

i have first tryed to avoid the problem by just training on a single dataFrame of around 20k samples, but the model is having generalisation issue. I also tryed to just do one epoche per dataframe, but it seems like each dataframe was un-learning the others.

Asked By: vuvu 700



You should use fit_generator instead of fit. It loads the examples as needed instead of loading them all at once. If you’re familiar at all with Python 2, it’s like the difference between xrange and range, range creates a list and puts it in your ram whereas xrange would create a generator, it’s much more memory efficient. range now defaults to the xrange behavior in Python 3.

Also just a PSA, I didn’t know this when I first got interested in ML, but Keras is now Tensorflow, and Caffe is now Pytorch. Keras and Caffe may be considered older tools nowadays, and may not receive updates as frequently as Pytorch or Tensorflow. Personally I recommend Pytorch out of the two, since Tensorflow is owned by Google and Pytorch has a little more of an open-source spirit to it.

Answered By: Brock Brown

I guess you have 2 options.

  1. You can try a custom data generator. Here is an tutorial (i think this may be a little difficult):

  2. You can also define a custom training loop, here is a tutorial:

I am not sure if this is what you want.