Keras SimpleRNNCell appears to fail to distribute learning among all its weights

Question:

This question is about SimpleRNNCell, a class in Tensorflow to perform basic Recurrent Neural Network. Unless there’s something fundamentally wrong in my code, it appears that training is not spread over all the available weights but only a subset of them, making irrelevant the recurrent machinery.
I’ve written with keras a minimal program with just one RNN cell and a dense layer. When I print out the learned weights the state weight doesn’t appear to have changed since its initialization. Here’s my code:

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import RNN
from tensorflow.keras.layers import SimpleRNN, SimpleRNNCell
from sklearn.preprocessing import MinMaxScaler
from tensorflow import random as rnd

#Fix the seed
rnd.set_seed(0)


#The dataset can be downloaded from https://mantas.info/wp/wp-content/uploads/simple_esn/MackeyGlass_t17.txt
data = np.loadtxt('MackeyGlass_t17.txt')

#Normalize
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(data.reshape(-1, 1))

#Split Dataset in Train and Test
train, test = scaled[0:-100], scaled[-100:]

#Split into input and output 
train_X, train_y = train[:-1], train[1:]
test_X, test_y = test[:-1], test[1:] 

#Reshaping 
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

#Batch and epochs
batch_size = 20
epochs = 2

#Design and run the model
model = Sequential()
model.add(RNN(SimpleRNNCell(1)))
#model.add(SimpleRNN(1))) # This generates the same results as the above line
model.add(Dense(train_y.shape[1]))
model.compile(loss='huber', optimizer='adam')
model.fit(train_X, train_y, epochs=epochs, batch_size=batch_size, validation_data=(test_X, test_y), verbose=0, shuffle=False)

#Print the weights of the dense layer
for layer in model.layers: print(layer.get_weights())

If I run this code with 2 epochs I receive the following output:

[array([[-0.8942287]], dtype=float32), array([[1.]], dtype=float32), array([0.05435111], dtype=float32)]
[array([[-1.272426]], dtype=float32), array([0.04711587], dtype=float32)]

If I run this code with 3 epochs I receive the following output:

[array([[-0.89395165]], dtype=float32), array([[1.]], dtype=float32), array([0.06734365], dtype=float32)]
[array([[-1.2927996]], dtype=float32), array([0.05247825], dtype=float32)]

Note that with a scalar series and just one cell, matrices and vectors reduce to only one element, so I end up having 5 weights: respectively the input weight, the state weight, the input bias, the dense layer weight, the dense layer bias. All these weights changed during learning apart from the state weight, which is stuck to its initialization i.e. 1.0.
Why the learning process doesn’t affect the state weight? Is there any obvious mistake in the way I implemented the model?

Note:
Ubuntu 20.04, Python 3.9.15, Tensorflow 2.7.0, no GPU.

After accepting the answer I modified my code. Lines:

#Split into input and output 
train_X, train_y = train[:-1], train[1:]
test_X, test_y = test[:-1], test[1:] 

#Reshaping 
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

should be replaced by:

def get_XY(dat, time_steps):
    # Indices of target array
    Y_ind = np.arange(time_steps, len(dat), time_steps)
    Y = dat[Y_ind]
    # Prepare X
    rows_x = len(Y)
    X = dat[range(time_steps*rows_x)]
    X = np.reshape(X, (rows_x, time_steps, 1))    
    return X, Y
 
time_steps = 12
train_X, train_y = get_XY(train, time_steps)
test_X, test_y = get_XY(train, time_steps)

to get the state weight to be trained.
Now the output with 3 epochs is:

[array([[-0.63304466]], dtype=float32), array([[0.9038045]], dtype=float32), array([0.06537308], dtype=float32)]
[array([[-0.97942555]], dtype=float32), array([-0.07855547], dtype=float32)]

Asked By: pfaz

||

Answers:

thanks for taking the time to providing complete working code to reproduce the issue.

The problem here isn’t the RNN, it’s the shape of your input data:

https://colab.research.google.com/drive/1pkFuU_nMyJnVHF3TvCSQ49e-IORFu3PQ?authuser=1#scrollTo=lIYdn1woOS1n

Keras, expects RNN inputs to have shape (batch, time, features), your data with shape (10k, 1, 1)

It only ever sees one time step, so it never has a chance to learn how to handle the previous time-step.

Try this tool for a fix: https://www.tensorflow.org/api_docs/python/tf/keras/utils/timeseries_dataset_from_array

Answered By: mdaoust