Tensorflow crashes when ask it to fit model

Question:

Tensorflow on gpu new to me, first naive question is, am I correct in assuming that I can use a gpu (nv gtx 1660ti) to run tensorflow ml operations, while it simultaneously runs my monitor? Only have one gpu card in my pc, assume it can do both at the same time or do I require a dedicated gpu for tensorflow only, that is not connected to any monitor?

All on ubuntu 21.10, have set up nvidia-toolkit, cudnn, tensorflow, tensorflow-gpu in a conda env, all appears to work fine: 1 gpu visible, built with cudnn 11.6.r11.6, tf version 2.8.0, python version 3.7.10 all in conda env running on a jupyter notebook. All seems to run fine until I attempt to train a model and then I get this error message:

2022-03-19 04:42:48.005029: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8302

and then the kernel just locks up and crashes. BTW the code worked prior to installing gpu, when it simply used cpu. Is this simply a version mismatch somewhere between python, tensorflow, tensorflow-gpu, cudnn versions or something more sinister? Thx. J.

Asked By: Jim Maas

||

Answers:

am I correct in assuming that I can use a GPU (nv gtx 1660ti) to run
tensorflow ml operations, while it simultaneously runs my monitor?

Yes, you can check with nvidia-smi on ubuntu to see how much free memory you have or which processes are using GPU.

Only have one GPU card in my pc, assume it can do both at the same?
time

Yes, It can. Most people do the same, a training process on GPU is just similar to running a game, (but more memory hungry)

About the problem:

install based on this version table.

check your driver version with nvidia-smi But, for true Cuda version check this nvcc -V ( the Cuda version in nvidia-smi is actually max supported Cuda version. )

just install pip install tensorflow-gpu this will also install keras for you.

check if tensorflow has access to GPU as follow:

import tensorflow as tf
tf.test.is_gpu_available() #should return True 
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Answered By: Sadra

install based on this version table.

That was the key for me. Had the same issue , CPU worked fine, GPU would dump out during model fit with an exit code but no error. The matrix will show you that tensorflow 2.5 – 2.8 work with CUDA 11.2 and cudnn 8.1 , the ‘latest’ versions are 11.5 and 8.4 as of 05/2022. I rolled back both versions and everything is working fine.

Answered By: cazub

The matrix will show you that tensorflow 2.5 – 2.8 work with CUDA 11.2 and cudnn 8.1

I believe the problem is that CUDA 11.2 is not available for Windows 11.

Answered By: SimoRed
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.