Triple for-loop in a vector
Question:
I have a simple numpy array (3xN) like:
v = np.array([[-3.33829, -3.42467, -3.53332],
[-2.67681, -2.6082 , -3.49502],
[-3.49497, -2.73177, -2.61499],
[-2.76056, -3.57753, -2.67334],
[-1.96801, -3.47521, -3.51974],
[-1.25571, -2.69451, -3.45554],
[-1.94568, -2.59504, -2.72568],
[-1.28991, -3.47927, -2.73176],
[-0.51201, -3.50684, -3.40448],
[ 0.22398, -2.70244, -3.43421]])
Here, N = 10, but it is much larger than here (+500) in my real case. Each row is a point – Euclidean coordinate.
I would like to carry out:
where i, j and k indicate different rows from v.
How can I implement it on Python in a fast way?
Answers:
You can do this using numpy broadcasting operations:
diffs = ((v[:, None] - v) ** 2).sum(-1)
d = np.exp(diffs + diffs[:, None]).sum((0, 1))
print(d)
# [3.08316899e+11 2.37020625e+07 4.05357364e+12 8.22697743e+08
# 8.85209202e+04 2.55340202e+05 7.33879459e+04 1.88175133e+05
# 8.10134295e+08 6.62122925e+12]
Even for an array of size 500, the result is computed in just a few seconds:
%%time
v = np.random.rand(500, 3)
diffs = np.sum((v[:, None] - v) ** 2, -1)
d = np.exp(diffs + diffs[:, None]).sum((0, 1))
# CPU times: user 2.74 s, sys: 5.5 ms, total: 2.75 s
# Wall time: 2.75 s
IIUC, the equation suggests pairwise vector differences, and not squared distance between vectors.
The pairwise difference between N vectors will be N*N vectors.
Finally, I would assume since you are only reducing over j
and k
axes, the output vector is (10,3)
and not (10,)
. Do correct me if I am wrong.
import numpy as np
d = np.exp(((v[:,None]-v)**2)[:,None] + ((v[:,None]-v)**2)).sum((0,1))
print(d)
#### Stepwise breakdown
#v #i,3 -> 10,3
#diff = (v[:,None]-v)**2 #j,i,3 -> 10,10,3
#power = diff[:,None]+diff #k,j,i,3 -> 10,10,10,3
#exp = np.exp(power) #k,j,i,3 -> 10,10,10,3
#d = np.sum(exp,(1,2)) #i,3 -> 10,3
array([[4.38558108e+11, 2.11224470e+02, 2.08153285e+02],
[6.10332697e+09, 2.42309774e+02, 2.00079357e+02],
[1.37237360e+12, 2.11552094e+02, 2.32739462e+02],
[9.98934092e+09, 2.51158071e+02, 2.16562340e+02],
[1.77827910e+08, 2.22151678e+02, 2.05163797e+02],
[1.91234145e+08, 2.19457894e+02, 1.92858561e+02],
[1.63391357e+08, 2.46419838e+02, 2.04498335e+02],
[1.67512751e+08, 2.23119070e+02, 2.03232700e+02],
[8.45322705e+09, 2.30065042e+02, 1.85024981e+02],
[1.14468558e+12, 2.17683864e+02, 1.89388595e+02]])
Benchmark –
%%timeit
np.exp(((v[:,None]-v)**2)[:,None] + ((v[:,None]-v)**2)).sum((0,1))
# 21.2 s ± 3.27 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
I have a simple numpy array (3xN) like:
v = np.array([[-3.33829, -3.42467, -3.53332],
[-2.67681, -2.6082 , -3.49502],
[-3.49497, -2.73177, -2.61499],
[-2.76056, -3.57753, -2.67334],
[-1.96801, -3.47521, -3.51974],
[-1.25571, -2.69451, -3.45554],
[-1.94568, -2.59504, -2.72568],
[-1.28991, -3.47927, -2.73176],
[-0.51201, -3.50684, -3.40448],
[ 0.22398, -2.70244, -3.43421]])
Here, N = 10, but it is much larger than here (+500) in my real case. Each row is a point – Euclidean coordinate.
I would like to carry out:
where i, j and k indicate different rows from v.
How can I implement it on Python in a fast way?
You can do this using numpy broadcasting operations:
diffs = ((v[:, None] - v) ** 2).sum(-1)
d = np.exp(diffs + diffs[:, None]).sum((0, 1))
print(d)
# [3.08316899e+11 2.37020625e+07 4.05357364e+12 8.22697743e+08
# 8.85209202e+04 2.55340202e+05 7.33879459e+04 1.88175133e+05
# 8.10134295e+08 6.62122925e+12]
Even for an array of size 500, the result is computed in just a few seconds:
%%time
v = np.random.rand(500, 3)
diffs = np.sum((v[:, None] - v) ** 2, -1)
d = np.exp(diffs + diffs[:, None]).sum((0, 1))
# CPU times: user 2.74 s, sys: 5.5 ms, total: 2.75 s
# Wall time: 2.75 s
IIUC, the equation suggests pairwise vector differences, and not squared distance between vectors.
The pairwise difference between N vectors will be N*N vectors.
Finally, I would assume since you are only reducing over j
and k
axes, the output vector is (10,3)
and not (10,)
. Do correct me if I am wrong.
import numpy as np
d = np.exp(((v[:,None]-v)**2)[:,None] + ((v[:,None]-v)**2)).sum((0,1))
print(d)
#### Stepwise breakdown
#v #i,3 -> 10,3
#diff = (v[:,None]-v)**2 #j,i,3 -> 10,10,3
#power = diff[:,None]+diff #k,j,i,3 -> 10,10,10,3
#exp = np.exp(power) #k,j,i,3 -> 10,10,10,3
#d = np.sum(exp,(1,2)) #i,3 -> 10,3
array([[4.38558108e+11, 2.11224470e+02, 2.08153285e+02],
[6.10332697e+09, 2.42309774e+02, 2.00079357e+02],
[1.37237360e+12, 2.11552094e+02, 2.32739462e+02],
[9.98934092e+09, 2.51158071e+02, 2.16562340e+02],
[1.77827910e+08, 2.22151678e+02, 2.05163797e+02],
[1.91234145e+08, 2.19457894e+02, 1.92858561e+02],
[1.63391357e+08, 2.46419838e+02, 2.04498335e+02],
[1.67512751e+08, 2.23119070e+02, 2.03232700e+02],
[8.45322705e+09, 2.30065042e+02, 1.85024981e+02],
[1.14468558e+12, 2.17683864e+02, 1.89388595e+02]])
Benchmark –
%%timeit
np.exp(((v[:,None]-v)**2)[:,None] + ((v[:,None]-v)**2)).sum((0,1))
# 21.2 s ± 3.27 s per loop (mean ± std. dev. of 7 runs, 1 loop each)