Unknown error/crash – TensorFlow LSTM with GPU (no output after start of 1st epoch)

Question

I’m trying to train a model using LSTM layers. I’m using a GPU and all needed libraries are loaded.

When I’m building the model this way:

model = keras.Sequential()

model.add(layers.LSTM(256, activation="relu", return_sequences=False))  # note the activation function
model.add(layers.Dropout(0.2))

model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))

model.compile(
    loss=keras.losses.BinaryCrossentropy(),
    optimizer="adam",
    metrics=["accuracy"]
)

It works. But it’s using activation="relu" on the LSTM layer, so it’s not CuDNNLSTM – that’s automatically chosen when the activation function is tanh (default) – if I’m not wrong.

So, it’s painfully slow and I would like to run the faster CuDNNLSTM. My code for that:

model = keras.Sequential()

model.add(layers.LSTM(256, return_sequences=False))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))

model.compile(
    loss=keras.losses.BinaryCrossentropy(),
    optimizer="adam",
    metrics=["accuracy"]
)

It’s basically the same, only without the activation function provided, so tanh will be used.
But now it’s not training, and the end of output looks like this:

2021-04-19 22:41:46.046218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-19 22:41:46.046426: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:41:46.046642: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:41:46.046942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-19 22:41:46.047124: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-19 22:41:46.047312: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-19 22:41:46.047489: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-19 22:41:46.047663: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-19 22:41:46.047936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-19 22:41:46.665456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-19 22:41:46.665712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0 
2021-04-19 22:41:46.665876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N 
2021-04-19 22:41:46.666186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2982 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-04-19 22:41:46.667505: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-19 22:42:07.374456: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/50
2021-04-19 22:42:08.922891: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:42:09.272264: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:42:09.302667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll

Process finished with exit code -1073740791 (0xC0000409)

It just starts the first epoch, then freezes for a minute and exits with this weird exit code.

Shape of the input data: tf.Tensor([50985 29 7], shape=(3,), dtype=int32)
My GPU: Nvidia GTX 1050 Ti
CUDA: v11.3
OS: Windows 10
IDE: PyCharm

Finding solutions for this problem is a bit challenging as I don’t have any error outputed. Am I doing something wrong? Has anyone encountered a similar issue? What should help?

// Edit; I tried:

running this model with much fewer units (2 instead of 256) and lower batch_size
downgrading tensorflow to 2.4.0, CUDA to 11.0 and cudnn to 8.0.1 with python 3.7.1 (this should be a right combination according to this list from TensorFlow website)
restarting my PC 🙂

Asked By: Brunon Blok

||

Source

Answer 1

I found the solution… kinda.

So it works as it should when I downgraded tensorflow to 2.1.0, CUDA to 10.1 and cudnn to 7.6.5 (at the time 4th combination from this list on TensorFlow website)

I don’t know why it didn’t work at the newest version, or at the valid combination for tensorflow 2.4.0.

It’s working well so my issue is solved. Nonetheless it would be nice to know why using LSTM with cudnn on higher versions didn’t work for me, as I haven’t found this issue anywhere.

Answered By: Brunon Blok

Answer 2

replace

y1 = LSTM(64)(input)

with

y1 = RNN(tf.keras.layers.LSTMCell(64))(input)

Answered By: R S

Unknown error/crash – TensorFlow LSTM with GPU (no output after start of 1st epoch)

Question:

Answers: