InternalError: Failed copying input tensor from CPU:0 to GPU:0 in order to run _EagerConst: Dst tensor is not initialized
Question:
I am running a code for Tensorflow cross validation training with 10 folds. The code works in a for loop where I have to run the model.fit each time of the loop. When I run it for the first fold it works well and then GPU memory becomes full.
Here is my for loop:
acc_per_fold = []
loss_per_fold = []
for train, test in kfold.split(x_train, y_train):
fold_no = 1
# Define the model architecture
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape = x_train[0].shape, activation = "relu"))
model.add(MaxPooling2D(2,2))
model.add(Conv2D(32, kernel_size=(3,3), activation = "relu"))
model.add(MaxPooling2D(2,2))
model.add(Flatten())
model.add(Dense(64, activation = "relu"))
model.add(Dropout(0.1))
model.add(Dense(32, activation = "tanh"))
model.add(Dense(1, activation = "sigmoid"))
# Compile the model
model.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001),
metrics = ["accuracy"])
# Generate a print
print('------------------------------------------------------------------------')
print(f'Training for fold {fold_no} ...')
# Fit data to model
history = model.fit(np.array(x_train)[train], np.array(y_train)[train],
batch_size=32,
epochs=10,
verbose=1)
# Generate generalization metrics
scores = model.evaluate(np.array(x_train)[test], np.array(y_train)[test], verbose=0)
print(f"Score for fold {fold_no}: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%")
acc_per_fold.append(scores[1] * 100)
loss_per_fold.append(scores[0])
# Increase fold number
fold_no += fold_no
Also, I searched and found using numba library is an option to release the GPU memory, it worked but the kernel in Jupyter notebook died and I had to reset so this solution will not work in my case.
Answers:
I faced this problem long time ago, Even after reducing the batch size didn’t work. My GPU was rtx 3060 12 GB RAM and it worked on Google Collab Pro
However, there is one solution for this problem that may work. You can use the gc library which cleans the GPU after each iteration
import gc
You can put this statement in the loop
gc.collect()
and hopefully it will work by cleaning the RAM after each loop
I am running a code for Tensorflow cross validation training with 10 folds. The code works in a for loop where I have to run the model.fit each time of the loop. When I run it for the first fold it works well and then GPU memory becomes full.
Here is my for loop:
acc_per_fold = []
loss_per_fold = []
for train, test in kfold.split(x_train, y_train):
fold_no = 1
# Define the model architecture
model = Sequential()
model.add(Conv2D(32, kernel_size=(3,3), input_shape = x_train[0].shape, activation = "relu"))
model.add(MaxPooling2D(2,2))
model.add(Conv2D(32, kernel_size=(3,3), activation = "relu"))
model.add(MaxPooling2D(2,2))
model.add(Flatten())
model.add(Dense(64, activation = "relu"))
model.add(Dropout(0.1))
model.add(Dense(32, activation = "tanh"))
model.add(Dense(1, activation = "sigmoid"))
# Compile the model
model.compile(loss = "binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001),
metrics = ["accuracy"])
# Generate a print
print('------------------------------------------------------------------------')
print(f'Training for fold {fold_no} ...')
# Fit data to model
history = model.fit(np.array(x_train)[train], np.array(y_train)[train],
batch_size=32,
epochs=10,
verbose=1)
# Generate generalization metrics
scores = model.evaluate(np.array(x_train)[test], np.array(y_train)[test], verbose=0)
print(f"Score for fold {fold_no}: {model.metrics_names[0]} of {scores[0]}; {model.metrics_names[1]} of {scores[1]*100}%")
acc_per_fold.append(scores[1] * 100)
loss_per_fold.append(scores[0])
# Increase fold number
fold_no += fold_no
Also, I searched and found using numba library is an option to release the GPU memory, it worked but the kernel in Jupyter notebook died and I had to reset so this solution will not work in my case.
I faced this problem long time ago, Even after reducing the batch size didn’t work. My GPU was rtx 3060 12 GB RAM and it worked on Google Collab Pro
However, there is one solution for this problem that may work. You can use the gc library which cleans the GPU after each iteration
import gc
You can put this statement in the loop
gc.collect()
and hopefully it will work by cleaning the RAM after each loop