Reshaping a 2D matrix of Time series vectors into a 3D matrix of sequences (frames) – overlapping windows

Question:

I have a matrix (shape: m by 51) of 51 time series vectors m samples each. I want to train two autoencoders one using CNN and another using LSTM network. I want to reshape the 2D matrix into a 3D matrix such that it contains m_new sequences for each of the 51 variables and each sequence is w long with overlapping of lap samples.

I managed to pull this off but without the overlapping part. Is there an efficient way to do it?

W = 20 #window size
m_new = int(np.floor(m/W))
m_trct = int(m_new*W)
X_raw_trct = X_raw[0:m_trct,:]
X = np.reshape(X_raw_trct,(m_new,W,X_raw_trct.shape[1]))

As demonstrated below, the sequences are generated with overlapping of lap = w-1.

enter image description here

** UPDATE **
In reference to the answer in Split Python sequence (time series/array) into subsequences with overlap,
using the function sub-sequences which splits the 1D array into w long sub-sequences with overlap of w-1 (stride of 1) resulting in a 2D array of shape (m_new, w) . As in code 2
below, I had to use a loop to work every vector of the 51 variables as a 1D array then appending the results of the 2D arrays to produce my final 3D array of shape (m_new, w, 51). However, the loop takes so long to execute.

**code 2**
def subsequences(ts, window):
## ts is of shape (m,)
    shape = (ts.size - window + 1, window)
    strides = ts.strides * 2
    return np.lib.stride_tricks.as_strided(ts, shape=shape, strides=strides)

# rescaledX_raw.shape is (m,51)
n = rescaledX_raw.shape[1]
# n = 51

a = rescaledX_raw[:,0]
# a.shape is (m,)

Xaa = subsequences(a,W)
X = ones(Xaa.shape)*-1
# X.shape is (m_new, W) 


for kk in range(n):
## a is of shape (m,)
    a = rescaledX_raw[:,kk]
    Xaa = subsequences(a,W)
    X = np.dstack((X, Xaa))



X_nn = np.delete(X, 0, axis=2)
# X_nn.shape is (m_new, W, 51)

In addition, I tried to work it out as a full 2D array of shape (m by 51) to the 3D array of shape (m_new,w,51) using the function in code 3

**code 3**
def rolling_window(a, window):
## a is of shape (51,m)
    shape = (a.shape[-1] - window + 1,window,a.shape[0])
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

But the resulted 3D matrix is not the correct one. Kindly refer to the demonstration below. In addition, how can I add the stride as a variable I can change. In scripts above the stride is 1 (meaning the overlap is w-1)
Demonstration of the output of Code 2 and Code 3

Asked By: Ayomi Al-noor

||

Answers:

I found a helpful post to get this done using TimeseriesGenerator. Custom Data Generator for Keras LSTM with TimeSeriesGenerator

class CustomGenFit(TimeseriesGenerator):
    def __getitem__(self, idx):
        x, y = super().__getitem__(idx)
        return x, x

Xsequences = CustomGenPredict(X, X, length=W, stride = s,sampling_rate=1, batch_size=m)
Answered By: Ayomi Al-noor
def lstm_data_transform(x_data, y_data, num_steps=10):
    """ Changes data to the format for LSTM training
for sliding window approach """
    # Prepare the list for the transformed data
    X, y = list(), list()
    # Loop of the entire data set
    for i in range(x_data.shape[0]):
        # compute a new (sliding window) index
        end_ix = i + num_steps
        # if index is larger than the size of the dataset, we stop
        if end_ix >= x_data.shape[0]:
            break
        # Get a sequence of data for x
        seq_X = x_data[i:end_ix]
        # Get only the last element of the sequency for y
        #seq_y = y_data[end_ix]#ori end-----fking somw wrong
        seq_y = y_data[i]#first correct wtf
        # Append the list with sequencies
        X.append(seq_X)
        y.append(seq_y)
    # Make final arrays
    x_array = np.array(X)
    y_array = np.array(y)
    return x_array, y_array
Answered By: R S