Keras BatchNormalization layer : InternalError: cuDNN launch failure

Question:

The BatchNormalization layer of my Keras model (using Tensorflow) does not work and return an InternalError exception at training time.

Here is the line defining the BatchNormalization layer in my model :

bn = BatchNormalization(axis=3)(grid)

I create 2 models (1 before, 1 after) in order to debug the model :

debug = Model(inputs=[question1, question2], outputs=grid)
debug.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

bn = BatchNormalization(axis=3)(grid)

debug2 = Model(inputs=[question1, question2], outputs=bn)
debug2.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

And then I predict some random data, just to actually predict anything :

pred = debug.predict([Q1_test_debug, Q2_test_debug], verbose=1, batch_size=1)
print(pred[0].shape)
pred = debug2.predict([Q1_test_debug, Q2_test_debug], verbose=1, batch_size=1)
print(pred[0].shape)

And the result is :

(2, 25)
2/2 [==============================] - 2s 1s/step
(25, 25, 600)
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1291     try:
-> 1292       return fn(*args)
   1293     except errors.OpError as e:

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1276       return self._call_tf_sessionrun(
-> 1277           options, feed_dict, fetch_list, target_list, run_metadata)
   1278 

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1366         self._session, options, feed_dict, fetch_list, target_list,
-> 1367         run_metadata)
   1368 

InternalError: cuDNN launch failure : input shape ([1,600,25,25])
     [[{{node batch_normalization_1/FusedBatchNorm}} = FusedBatchNorm[T=DT_FLOAT, _class=["loc:@batch_normalization_1/cond/Switch_1"], data_format="NCHW", epsilon=0.001, is_training=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](batch_normalization_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, batch_normalization_1/gamma/read, batch_normalization_1/beta/read, batch_normalization_1/Const_4, batch_normalization_1/Const_4)]]
     [[{{node batch_normalization_1/cond/Merge/_949}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_133_batch_normalization_1/cond/Merge", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

InternalError                             Traceback (most recent call last)
<ipython-input-11-748dc132eac2> in <module>()
      4 pred = debug.predict([Q1_test_debug, Q2_test_debug], verbose=1, batch_size=1)
      5 print(pred[0].shape)
----> 6 pred = debug2.predict([Q1_test_debug, Q2_test_debug], verbose=1, batch_size=1)
      7 print(pred[0].shape)

~/.local/lib/python3.5/site-packages/keras/engine/training.py in predict(self, x, batch_size, verbose, steps)
   1833         f = self.predict_function
   1834         return self._predict_loop(f, ins, batch_size=batch_size,
-> 1835                                   verbose=verbose, steps=steps)
   1836 
   1837     def train_on_batch(self, x, y,

~/.local/lib/python3.5/site-packages/keras/engine/training.py in _predict_loop(self, f, ins, batch_size, verbose, steps)
   1329                     ins_batch[i] = ins_batch[i].toarray()
   1330 
-> 1331                 batch_outs = f(ins_batch)
   1332                 if not isinstance(batch_outs, list):
   1333                     batch_outs = [batch_outs]

~/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2480         session = get_session()
   2481         updated = session.run(fetches=fetches, feed_dict=feed_dict,
-> 2482                               **self.session_kwargs)
   2483         return updated[:len(self.outputs)]
   2484 

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    885     try:
    886       result = self._run(None, fetches, feed_dict, options_ptr,
--> 887                          run_metadata_ptr)
    888       if run_metadata:
    889         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1108     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1109       results = self._do_run(handle, final_targets, final_fetches,
-> 1110                              feed_dict_tensor, options, run_metadata)
   1111     else:
   1112       results = []

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1284     if handle is None:
   1285       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1286                            run_metadata)
   1287     else:
   1288       return self._do_call(_prun_fn, handle, feeds, fetches)

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1306           self._config.experimental.client_handles_error_formatting):
   1307         message = error_interpolation.interpolate(message, self._graph)
-> 1308       raise type(e)(node_def, op, message)
   1309 
   1310   def _extend_graph(self):

InternalError: cuDNN launch failure : input shape ([1,600,25,25])
     [[{{node batch_normalization_1/FusedBatchNorm}} = FusedBatchNorm[T=DT_FLOAT, _class=["loc:@batch_normalization_1/cond/Switch_1"], data_format="NCHW", epsilon=0.001, is_training=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](batch_normalization_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, batch_normalization_1/gamma/read, batch_normalization_1/beta/read, batch_normalization_1/Const_4, batch_normalization_1/Const_4)]]
     [[{{node batch_normalization_1/cond/Merge/_949}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_133_batch_normalization_1/cond/Merge", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'batch_normalization_1/FusedBatchNorm', defined at:
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/remondn/.local/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/remondn/.local/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/remondn/.local/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 497, in start
    self.io_loop.start()
  File "/home/remondn/.local/lib/python3.5/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.5/asyncio/base_events.py", line 345, in run_forever
    self._run_once()
  File "/usr/lib/python3.5/asyncio/base_events.py", line 1312, in _run_once
    handle._run()
  File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run
    self._callback(*self._args)
  File "/home/remondn/.local/lib/python3.5/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
    handler_func(fileobj, events)
  File "/home/remondn/.local/lib/python3.5/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/remondn/.local/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/remondn/.local/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/remondn/.local/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/remondn/.local/lib/python3.5/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/remondn/.local/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/remondn/.local/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/remondn/.local/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/remondn/.local/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/remondn/.local/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/remondn/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/remondn/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/remondn/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2901, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/remondn/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-44a967130b40>", line 87, in <module>
    bn = BatchNormalization(axis=3)(grid)
  File "/home/remondn/.local/lib/python3.5/site-packages/keras/engine/topology.py", line 619, in __call__
    output = self.call(inputs, **kwargs)
  File "/home/remondn/.local/lib/python3.5/site-packages/keras/layers/normalization.py", line 181, in call
    epsilon=self.epsilon)
  File "/home/remondn/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1831, in normalize_batch_in_training
    epsilon=epsilon)
  File "/home/remondn/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1806, in _fused_normalize_batch_in_training
    data_format=tf_data_format)
  File "/home/remondn/.local/lib/python3.5/site-packages/tensorflow/python/ops/nn_impl.py", line 909, in fused_batch_norm
    name=name)
  File "/home/remondn/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 3466, in _fused_batch_norm
    is_training=is_training, name=name)
  File "/home/remondn/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/remondn/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/remondn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
    op_def=op_def)
  File "/home/remondn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
    self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): cuDNN launch failure : input shape ([1,600,25,25])
     [[{{node batch_normalization_1/FusedBatchNorm}} = FusedBatchNorm[T=DT_FLOAT, _class=["loc:@batch_normalization_1/cond/Switch_1"], data_format="NCHW", epsilon=0.001, is_training=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](batch_normalization_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, batch_normalization_1/gamma/read, batch_normalization_1/beta/read, batch_normalization_1/Const_4, batch_normalization_1/Const_4)]]
     [[{{node batch_normalization_1/cond/Merge/_949}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_133_batch_normalization_1/cond/Merge", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

A few things that I don’t understand :

  • As we can see ((25, 25, 600)), the format of the output of previous layer / input of BatchNormalization is in the format channels_last. But the error report input shape ([1,600,25,25]), which is in the format channels_first. Why it suddenly changed ?
  • I specified in my declaration of the BatchNormalization layer axis = 3, but in the error, we have FusedBatchNorm [...] data_format="NCHW", indicating a channels_first format. No matter which axis I choose (I tried 1, 2, 0, -1), I always have this error with this data_format. What it is not changing when I am changing the axis of BatchNormalization

Does anyone have any idea how to fix this ?

Asked By: Astariul

||

Answers:

Turns out, versions of libraries I was using were messed up.

I don’t know why, but everything else was working (actually, removing the BatchNormalization layer led to a working network…)

Anyway I updated my package to use CUDA 9.0 with cuDNN 7.0.5 and tensorflow-gpu 1.10.0

The links I used to get matching versions between all of these :

Answered By: Astariul

I got on this thread because I got a similar error. It turns out it was linked to my new hardware, too new for the libraries.
So with 2080 RTX Ti, I could get rid off the error with the following configuration:

  • Cuda 10.0 (compatible with its architecture)

  • CuDNN 7.4.1.5

  • tensorflow 1.13 (release candidate at that time, I used “pip3 install tf-nightly-gpu”, version with cuda 10.0 support)

  • I added the following in the code (see https://github.com/tensorflow/tensorflow/issues/24496):

from keras import backend as K
config = K.tf.ConfigProto()
config.gpu_options.allow_growth = True

Hope it helps someone else.

Answered By: 87VN0

I had the same problem but it turns out it was due to the memory shortage. My model was too big. When I reduced the batch size, the problem was solved.

Answered By: amirsina torfi

For anyone stumbling into this in 2023, here’s a possible solution. Tensorflow was working well except when I used tf.keras.layers.LayerNormalization I got the same error. Weirdly even "FusedBatchNorm" is mentioned in the error though it seems to suggest a different normalization layer.

I’m using Ubuntu, Tensorflow 2.11 from official docker image, and "nvidia-driver-525" drivers.

The solution was to downgrade drivers to "nvidia-driver-515" to get CUDA down to 11.x version.

Answered By: tjk