Vectorized way to construct a block Hankel matrix in numpy (or scipy)

Question:

I want to contrsuct the following matrix :

[v0 v1 v2 v3 .... v(M-d+1)
 v1 .
 v2 .  .
 .        .
 .
 vd .          .   v(M) ]

where each v(k) is a (ndarray) vector, say from a matrix

X = np.random.randn(100, 8)
M = 7
d = 3
v0 = X[:, 0]
v1 = X[:, 1]
...

Using a for loop, I can do something like this for example:

v1 = np.array([1, 2, 3]).reshape((-1, 1))
v2 = np.array([10, 20, 30]).reshape((-1, 1))
v3 = np.array([100, 200, 300]).reshape((-1, 1))
v4 = np.array([100.1, 200.1, 300.1]).reshape((-1, 1))
v5 = np.array([1.1, 2.2, 3.3]).reshape((-1, 1))

X = np.hstack((v1, v2, v3, v4, v5))
d = 2

X_ = np.zeros((d * X.shape[0], X.shape[1]+1-d))
for i in range (d):
   X_[i*X.shape[0]:(i+1) * X.shape[0], :] = X[:X.shape[0], i:i+(X.shape[1]+1-d)]

And I get :

X_ = array([[  1. ,  10. , 100. , 100.1],
   [  2. ,  20. , 200. , 200.1],
   [  3. ,  30. , 300. , 300.1],
   [ 10. , 100. , 100.1,   1.1],
   [ 20. , 200. , 200.1,   2.2],
   [ 30. , 300. , 300.1,   3.3]]) #Which is the wanted matrix

Is there any way to construct this matrix in a vectorized way (which I imagine would be faster than for loops when it comes to large matrices ?).

Thank you.

Asked By: Zed

||

Answers:

This looks about optimal; you did a good job vectorizing it already. The only improvement I can make is to replace np.zeros with np.empty, which skips initializing the array. I tried using np.vstack and np.lib.stride_tricks.sliding_window_view (after https://stackoverflow.com/a/60581287) and got the same performance as the for loop with np.empty.

# sliding window:
X_ = np.lib.stride_tricks.sliding_window_view(X, (X.shape[0], X.shape[1]+1-d)).reshape(d*X.shape[0], -1)

# np.vstack:
X_ = np.vstack([X[:, i:i+(X.shape[1]+1-d)] for i in range(d)])
Answered By: yut23

I didn’t understand the second part of the question exactly, but I was looking to create a Hankel-like matrix (non-square) just like the first part of the question really fast and I couldn’t find any answer for it. Here is how I did it:

N = 7
K = 4
L = N-K+1
B = np.vstack([range(K)]*L)
print(B.T+range(K))

Output:

[[0 1 2 3]
 [1 2 3 4]
 [2 3 4 5]
 [3 4 5 6]]

You can then use the above matrix to index your 1-dimensional array.

Answered By: Ash
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.