Numpy subtracting rows of a 2D array from another 2D array without for loop
Question:
I know that if we try to subtract a row vector v(1,3072) from a 2D array A(5000,3072) if A and v does have same number of column v is broadcasted, but subtracting stack of row vectors V (each row of V having to be subtracted to the whole of A) cannot be done.
I can’t figure out how to subtract V’s rows one by one from A without using a for
loop.
def compute_distances_one_loop(V):
num_test = V.shape[0]
num_train = A.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
initial=np.sqrt(np.square(A-V[i,:]))
dists[i,:]=initial.sum(axis=1)
return dists
Heres the semi-vectorized form of this problem, how do I get rid of that for loop?
For example
matrix_e=np.ones((3,3))*6
matrix_f=np.array([[1,2,3],[4,5,6]])
how do I get the matrix_g of shape (6,3) without using a for loop?
matrix_g=np.array([[5,4,3],[5,4,3],[5,4,3],[2,1,0],[2,1,0],[2,1,0]])
Answers:
If I understood your question correctly, you can make a big array (of the same size as A) by concatenating occurrences of V:
height_A, height_V = A.shape[0], V.shape[0]
occurrences, remainder = divmod(height_A, height_V)
mask = [V for i in range(occurrences)] + [V[:remainder]]
big_V = np.concatenate(mask)
Now you can safely do A – big_V !
(I separated steps to make it clearer, but you can easily combine them into a single statement
big_V = np.concatenate([V for i in range(A.shape[0]//V.shape[0])] + [V[:A.shape[0]%V.shape[0]]])
)
Edit – I better understand what you need now: subtract EACH row of V from the whole of A. It’s possible by adding a third dimension to both arrays like in the following picture, where A2 – V2 is represented by the array of green panes, to make use of broadcasting.
A2 = np.expand_dims(A, axis = 0) # from shape (5000, 3072) to (1, 5000, 3072)
V2 = np.expand_dims(V, axis = 1) # from shape (500, 3072) to (500, 1, 3072)
print (A2 - V2) # broadcasting makes the resulting shape (500, 5000, 3072)
Example, with:
A = np.ones((3,3))*6
V = np.array([[1,2,3],[4,5,6]])
print(A2 - V2)
# array([[[5., 4., 3.],
# [5., 4., 3.],
# [5., 4., 3.]],
#
# [[2., 1., 0.],
# [2., 1., 0.],
# [2., 1., 0.]]])
And you can calculate the array of distances between rows of A and V:
D = np.sqrt(np.square(A2 - V2).sum(axis = 2))
# array([[7.07106781, 7.07106781, 7.07106781],
# [2.23606798, 2.23606798, 2.23606798]])
I know that if we try to subtract a row vector v(1,3072) from a 2D array A(5000,3072) if A and v does have same number of column v is broadcasted, but subtracting stack of row vectors V (each row of V having to be subtracted to the whole of A) cannot be done.
I can’t figure out how to subtract V’s rows one by one from A without using a for
loop.
def compute_distances_one_loop(V):
num_test = V.shape[0]
num_train = A.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
initial=np.sqrt(np.square(A-V[i,:]))
dists[i,:]=initial.sum(axis=1)
return dists
Heres the semi-vectorized form of this problem, how do I get rid of that for loop?
For example
matrix_e=np.ones((3,3))*6
matrix_f=np.array([[1,2,3],[4,5,6]])
how do I get the matrix_g of shape (6,3) without using a for loop?
matrix_g=np.array([[5,4,3],[5,4,3],[5,4,3],[2,1,0],[2,1,0],[2,1,0]])
If I understood your question correctly, you can make a big array (of the same size as A) by concatenating occurrences of V:
height_A, height_V = A.shape[0], V.shape[0]
occurrences, remainder = divmod(height_A, height_V)
mask = [V for i in range(occurrences)] + [V[:remainder]]
big_V = np.concatenate(mask)
Now you can safely do A – big_V !
(I separated steps to make it clearer, but you can easily combine them into a single statement
big_V = np.concatenate([V for i in range(A.shape[0]//V.shape[0])] + [V[:A.shape[0]%V.shape[0]]])
)
Edit – I better understand what you need now: subtract EACH row of V from the whole of A. It’s possible by adding a third dimension to both arrays like in the following picture, where A2 – V2 is represented by the array of green panes, to make use of broadcasting.
A2 = np.expand_dims(A, axis = 0) # from shape (5000, 3072) to (1, 5000, 3072)
V2 = np.expand_dims(V, axis = 1) # from shape (500, 3072) to (500, 1, 3072)
print (A2 - V2) # broadcasting makes the resulting shape (500, 5000, 3072)
Example, with:
A = np.ones((3,3))*6
V = np.array([[1,2,3],[4,5,6]])
print(A2 - V2)
# array([[[5., 4., 3.],
# [5., 4., 3.],
# [5., 4., 3.]],
#
# [[2., 1., 0.],
# [2., 1., 0.],
# [2., 1., 0.]]])
And you can calculate the array of distances between rows of A and V:
D = np.sqrt(np.square(A2 - V2).sum(axis = 2))
# array([[7.07106781, 7.07106781, 7.07106781],
# [2.23606798, 2.23606798, 2.23606798]])