InternalError: Failed copying input tensor from CPU:0 to GPU:0 in order to run _EagerConst: Dst tensor is not initialized

Question

I am running a code for Tensorflow cross validation training with 10 folds. The code works in a for loop where I have to run the model.fit each time of the loop. When I run it for the first fold it works well and then GPU memory becomes full.
Here is my for loop:

acc_per_fold = []
loss_per_fold = []
for train, test in kfold.split(x_train, y_train):
    fold_no = 1
    # Define the model architecture
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3,3), input_shape = x_train[0].shape, activation = "relu"))
    model.add(MaxPooling2D(2,2))
    model.add(Conv2D(32, kernel_size=(3,3), activation = "relu"))
    model.add(MaxPooling2D(2,2))

    model.add(Flatten())
    model.add(Dense(64, activation = "relu"))
    model.add(Dropout(0.1))
    model.add(Dense(32, activation = "tanh"))
    model.add(Dense(1, activation = "sigmoid"))

    # Compile the model
    model.compile(loss = "binary_crossentropy", 
              optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001), 
              metrics = ["accuracy"])


    # Generate a print
    print('------------------------------------------------------------------------')
    print(f'Training for fold {fold_no} ...')
    # Fit data to model
    history = model.fit(np.array(x_train)[train], np.array(y_train)[train],
              batch_size=32,
              epochs=10,
              verbose=1)

    # Generate generalization metrics
    scores = model.evaluate(np.array(x_train)[test], np.array(y_train)[test], verbose=0)
    print(f"Score for fold {fold_no}: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%")
    acc_per_fold.append(scores[1] * 100)
    loss_per_fold.append(scores[0])

    # Increase fold number
    fold_no += fold_no

Also, I searched and found using numba library is an option to release the GPU memory, it worked but the kernel in Jupyter notebook died and I had to reset so this solution will not work in my case.

Asked By: Neuro_Coder

||

Source

Answer 1

I faced this problem long time ago, Even after reducing the batch size didn’t work. My GPU was rtx 3060 12 GB RAM and it worked on Google Collab Pro
However, there is one solution for this problem that may work. You can use the gc library which cleans the GPU after each iteration

import gc

You can put this statement in the loop

gc.collect()

and hopefully it will work by cleaning the RAM after each loop

Answered By: Omar

InternalError: Failed copying input tensor from CPU:0 to GPU:0 in order to run _EagerConst: Dst tensor is not initialized

Question:

Answers: