Numpy: Efficiently create this Matrix (N,3) base values of another list and repeating them

Question:

How can I create the matrix

[[a, 0, 0],
 [0, a, 0],
 [0, 0, a],
 [b, 0, 0],
 [0, b, 0],
 [0, 0, b],
 ...]

from the vector

[a, b, ...]

efficiently?

There must be a better solution than

np.squeeze(np.reshape(np.tile(np.eye(3), (len(foo), 1, 1)) * np.expand_dims(foo, (1, 2)), (1, -1, 3)))

right?

Answers:

You can create a zero array in advance, and then quickly assign values by slicing:

def concated_diagonal(ar, col):
    ar = np.asarray(ar).ravel()
    size = ar.size
    ret = np.zeros((col * size, col), ar.dtype)
    for i in range(col):
        ret[i::col, i] = ar
    return ret

Test:

>>> concated_diagonal([1, 2, 3], 3)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [2, 0, 0],
       [0, 2, 0],
       [0, 0, 2],
       [3, 0, 0],
       [0, 3, 0],
       [0, 0, 3]])

Note that because the number of columns you require is small, the impact of the relatively slow Python level for loop is acceptable:

%timeit concated_diagonal(np.arange(1_000_000), 3)
17.1 ms ± 84.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Update:
A solution with better performance! This is done in one step by clever reshaping and slice assignment:

def concated_diagonal(ar, col):
    ar = np.asarray(ar).reshape(-1, 1)
    size = ar.size
    ret = np.zeros((col * size, col), ar.dtype)
    ret.reshape(size, -1)[:, ::col + 1] = ar
    return ret

Time comparison:

%timeit concated_diagonal(np.arange(1_000_000), 3)
10.7 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Answered By: Mechanic Pig

Here is a solution by indexing:

a = [1,2,3]

N = 3
M = N*len(a)
out = np.zeros((M, N), dtype=int)
idx = np.arange(M)
out[idx, idx%N] = np.repeat(a, N)

output:

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1],
       [2, 0, 0],
       [0, 2, 0],
       [0, 0, 2],
       [3, 0, 0],
       [0, 3, 0],
       [0, 0, 3]])

intermediates:

idx
# array([0, 1, 2, 3, 4, 5, 6, 7, 8])

idx%N
# array([0, 1, 2, 0, 1, 2, 0, 1, 2])

np.repeat(a, N)
# array([1, 1, 1, 2, 2, 2, 3, 3, 3])
Answered By: mozway

You can use numpy.tile , numpy.repeat, and numpy.eye.

rep = 3
lst = np.array([1,2,3,4])
res = np.tile(np.eye(rep), (len(lst),1))*np.repeat(lst, rep)[:,None]
print(res)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [2. 0. 0.]
 [0. 2. 0.]
 [0. 0. 2.]
 [3. 0. 0.]
 [0. 3. 0.]
 [0. 0. 3.]
 [4. 0. 0.]
 [0. 4. 0.]
 [0. 0. 4.]]

Explanation:

>>> np.tile(np.eye(3), (2,1))
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

>>> np.repeat([3,4], 3)[:,None]
array([[3],
       [3],
       [3],
       [4],
       [4],
       [4]])

>>> np.tile(np.eye(3), (2,1)) * np.repeat([3,4], 3)[:,None]
array([[3., 0., 0.],
       [0., 3., 0.],
       [0., 0., 3.],
       [4., 0., 0.],
       [0., 4., 0.],
       [0., 0., 4.]])

Benchmark on colab(Because you want an efficient approach)

Variable is len(arr) and eye(3)


enter image description here


Code of benchmark:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import time
bench = []
for num in np.power(np.arange(10,1500,5),2):
    arr = np.arange(num)
    
    start = time.time()
    col = 3
    size = arr.size
    ret1 = np.zeros((col * size, col), arr.dtype)
    for i in range(col):
        ret1[i::col, i] = arr
    bench.append({'len_arr':num, 'Method':'Mechanic_Pig', 'Time':time.time() - start})

    start = time.time()
    N = 3
    M = N*len(arr)
    ret2 = np.zeros((M, N), dtype=int)
    idx = np.arange(M)
    ret2[idx, idx%N] = np.repeat(arr, N)
    bench.append({'len_arr':num, 'Method':'mozway', 'Time':time.time() - start})

    start = time.time()
    ret3 = np.tile(np.eye(3), (len(arr),1))*np.repeat(arr, 3)[:,None]
    bench.append({'len_arr':num, 'Method':'Imahdi', 'Time':time.time() - start})

    start = time.time()
    ret4 = np.einsum('j,ik->jki', arr, np.eye(3)).reshape(-1, 3)
    bench.append({'len_arr':num, 'Method':'Michael_Szczesn', 'Time':time.time() - start})


plt.subplots(1,1, figsize=(10,7))
df = pd.DataFrame(bench)
sns.lineplot(data=df, x="len_arr", y="Time", hue="Method", style="Method")
plt.show()

# Check result of different approaches are equal or not
print(((ret1 == ret2).all() == (ret1 == ret3).all() == (ret1 == ret4).all() == (ret2 == ret3).all() == (ret2 == ret4).all() == (ret3 == ret4).all()))
# True
Answered By: I'mahdi

An almost one-line solution:

import numpy as np

def concated_diagonal(vals):
    length = len(vals)
    return np.vstack([np.diag(np.full(length, v)) for v in vals])

print(concated_diagonal([1, 2, 3]))

Output

[[1 0 0]
 [0 1 0]
 [0 0 1]
 [2 0 0]
 [0 2 0]
 [0 0 2]
 [3 0 0]
 [0 3 0]
 [0 0 3]]
Answered By: Dani Mesejo