Shape rank problem with Tensorflow model as soon as I include BiLSTM layers

Question:

I’m having a problem with developing a NN model with tensorflow 2.3 that appears as soon as I include BiLSTM layers into the model. I’ve tried a custom model, but this is one from the Keras documentation page and it is also failing.

  • It cannot be a problem with input shapes, as this happens in compile time and the input data has yet not been provided to the model.
  • Tried it in another machine and it is working fine with same version of tensorflow.

The code I’m using is:

from tensorflow import keras
from tensorflow.keras import layers

max_features = 20000  # Only consider the top 20k words
maxlen = 200  # Only consider the first 200 words of each movie review

# Input for variable-length sequences of integers
inputs = keras.Input(shape=(None,), dtype="int32")
# Embed each integer in a 128-dimensional vector
x = layers.Embedding(max_features, 128)(inputs)
# Add 2 bidirectional LSTMs
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(64))(x)
# Add a classifier
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.summary()

The output error is:

InvalidArgumentError: Shape must be at least rank 3 but is rank 2 for '{{node BiasAdd}} = BiasAdd[T=DT_FLOAT, data_format="NCHW"](add, bias)' with input shapes: [?,256], [256].

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-7-dd69b7331e68> in <module>
      7 x = layers.Embedding(max_features, 128)(inputs)
      8 # Add 2 bidirectional LSTMs
----> 9 x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)
     10 x = layers.Bidirectional(layers.LSTM(64))(x)
     11 # Add a classifier
Asked By: Ed.

||

Answers:

I found the problem and so I’m answering my own question.

There is a setting in Keras that specifies the way of working with (and supossedly affecting only) image data.

  • Channels Last. Image data is represented in a three-dimensional array where the last channel represents the color channels, e.g. [rows][cols][channels].

  • Channels First. Image data is represented in a three-dimensional array where the first channel represents the color channels, e.g. [channels][rows][cols].

Keras keeps this setting differently for different backends, and this is supossedly set as Channels Last for Tensorflow, BUT it looks like in my environment it is set as Channels First.

Thankfully, this can be set manually and I managed to fix it with:

tensorflow.keras.backend.set_image_data_format("channels_last")

In the above example, which comes directly from Keras documentation, it would look as:

max_features = 20000  # Only consider the top 20k words
maxlen = 200  # Only consider the first 200 words of each movie review

tensorflow.keras.backend.set_image_data_format("channels_last") # <-- THIS FIXES IT

# Input for variable-length sequences of integers
inputs = keras.Input(shape=(None,), dtype="int32")
# Embed each integer in a 128-dimensional vector
x = layers.Embedding(max_features, 128)(inputs)
# Add 2 bidirectional LSTMs
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(64))(x)
# Add a classifier
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.summary()

I am surprised this settings make LSTM unable to instantiate, and not sure this should be considered a bug.

More info on this topic

Answered By: Ed.