tensorflow: Input 0 of layer "lstm_1" is incompatible with the layer: expected ndim=3, found ndim=4

Question:

I have a seq2seq model built as so:

latent_dim = 256
epochs = 20
batch_size = 64
encoder_inputs = Input(shape=(None,))
x = Embedding(num_encoder_tokens, latent_dim,input_length=max_english_sentence_length)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim,return_state=True)(x)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,))
x = Embedding(num_decoder_tokens, latent_dim,input_length=max_toki_sentence_length)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=["accuracy"])
model.summary()

model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
      batch_size=batch_size,
      epochs=epochs,
      validation_split=0.2)

encoder_input_data has shape (2000, 57, 7265) and contains 2000 sentences with at most 57 words, with one-hot encoded tokens.

decoder_input_data and decoder_target_data have shape (2000, 87, 987) and contains 2000 sentences with at most 87 words, with one-hot encoded tokens. Decoder_target_data is offset by one timestep from decoder_input_data.

As far as i’m aware, the data is formatted correctly but when running model.fit i get:

Input 0 of layer "lstm" is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (64, 57, 7265, 256)

What am i doing wrong here?

Answers:

The issue come from the Embedding Layer. To me, you can not use one hot encoding with the keras "Encobedding" layer. Indeed, as said in the docs :

Input shape

2D tensor with shape: (batch_size, input_length).

Output shape

3D tensor with shape: (batch_size, input_length, output_dim).

Note that it takes a 2D array and outputs a 3D array. This is why your inputs went from a 3D array to a 4D array (after the embedding). And this is not good for your LSTM, as it can only receive 3D array.

You should convert your one hot encoding to a basic numbering. So instead of a [0,0,1,0,0] input, you need just have a single value : 2.

This way your inputs will be a 2D array : (batch_size, input_length) and will be converted to a nice 3D array after the embedding layer : (batch_size, input_length, output_dim). And your LSTM layer will be happy 😉

Answered By: Clément Perroud
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.