Fast scalar triple product in numpy
Question:
I have a large number of vector triples, and I would like to compute the scalar triple product for them. I can do
import numpy
n = 871
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
c = numpy.random.rand(n, 3)
# <a, b x c>
omega = numpy.einsum('ij, ij->i', a, numpy.cross(b, c))
but numpy.cross
is fairly slow. The symmetry of the problem (its Levi-Civita expression is eps_{ijk} a_i b_j c_k
) suggests that there might be a better (faster) way to compute it, but I can’t seem to figure it out.
Any hints?
Answers:
It’s just the determinant.
omega=det(dstack([a,b,c]))
But it is slower….
An other equivalent solution is omega=dot(a,cross(b,c)).sum(1)
.
But I think you have to compute about 9 (for cross) + 3 (for dot) + 2 (for sum) = 14 operations for each det, so it seems to be near optimal. At best you will win a two factor in numpy.
EDIT:
If speed is critical, you must go at low level. numba
is a easy way to do that for a 15X factor here :
from numba import njit
@njit
def multidet(a,b,c):
n=a.shape[0]
d=np.empty(n)
for i in range(n):
u,v,w=a[i],b[i],c[i]
d[i]=
u[0]*(v[1]*w[2]-v[2]*w[1])+
u[1]*(v[2]*w[0]-v[0]*w[2])+
u[2]*(v[0]*w[1]-v[1]*w[0]) # 14 operations / det
return d
some tests:
In [155]: %timeit multidet(a,b,c)
100000 loops, best of 3: 7.79 µs per loop
In [156]: %timeit numpy.einsum('ij, ij->i', a, numpy.cross(b, c))
10000 loops, best of 3: 114 µs per loop
In [159]: allclose(multidet(a,b,c),omega)
Out[159]: True
Here’s one approach making use of slicing
and summing –
def slicing_summing(a,b,c):
c0 = b[:,1]*c[:,2] - b[:,2]*c[:,1]
c1 = b[:,2]*c[:,0] - b[:,0]*c[:,2]
c2 = b[:,0]*c[:,1] - b[:,1]*c[:,0]
return a[:,0]*c0 + a[:,1]*c1 + a[:,2]*c2
We can replace the first three steps that compute c0, c1, c2
and its stacked version with a one-liner, like so –
b[:,[1,2,0]]*c[:,[2,0,1]] - b[:,[2,0,1]]*c[:,[1,2,0]]
This would create another (n,3)
array, which has to used with a
for sum-reduction resulting in a (n,)
shaped array. With the proposed slicing_summing
method, we are directly getting to that (n,)
shaped array with summing of those three slices and thus avoiding that intermediate (n,3)
array.
Sample run –
In [86]: # Setup inputs
...: n = 871
...: a = np.random.rand(n, 3)
...: b = np.random.rand(n, 3)
...: c = np.random.rand(n, 3)
...:
In [87]: # Original approach
...: omega = np.einsum('ij, ij->i', a, np.cross(b, c))
In [88]: np.allclose(omega, slicing_summing(a,b,c))
Out[88]: True
Runtime test –
In [90]: %timeit np.einsum('ij, ij->i', a, np.cross(b, c))
10000 loops, best of 3: 84.6 µs per loop
In [91]: %timeit slicing_summing(a,b,c)
1000 loops, best of 3: 63 µs per loop
I’ve done a comparison of the methods mentioned in the answers.
The results:
@Divakar’s beats the einsum-cross one by a bit.
For the sake of completeness, let me note that there is another method solely relying on dot-products and a sqrt, see here. That method is slightly slower than both einsum-cross and slice-sum.
The plot was created with perfplot,
import numpy as np
import perfplot
def einsum_cross(a, b, c):
return np.einsum("ij, ij->i", a, np.cross(b, c))
def det(a, b, c):
return np.linalg.det(np.dstack([a, b, c]))
def slice_sum(a, b, c):
c0 = b[:, 1] * c[:, 2] - b[:, 2] * c[:, 1]
c1 = b[:, 2] * c[:, 0] - b[:, 0] * c[:, 2]
c2 = b[:, 0] * c[:, 1] - b[:, 1] * c[:, 0]
return a[:, 0] * c0 + a[:, 1] * c1 + a[:, 2] * c2
b = perfplot.bench(
setup=lambda n: (
np.random.rand(n, 3),
np.random.rand(n, 3),
np.random.rand(n, 3),
),
n_range=[2**k for k in range(1, 20)],
kernels=[einsum_cross, det, slice_sum],
)
b.save("out.png")
b.show()
I have a large number of vector triples, and I would like to compute the scalar triple product for them. I can do
import numpy
n = 871
a = numpy.random.rand(n, 3)
b = numpy.random.rand(n, 3)
c = numpy.random.rand(n, 3)
# <a, b x c>
omega = numpy.einsum('ij, ij->i', a, numpy.cross(b, c))
but numpy.cross
is fairly slow. The symmetry of the problem (its Levi-Civita expression is eps_{ijk} a_i b_j c_k
) suggests that there might be a better (faster) way to compute it, but I can’t seem to figure it out.
Any hints?
It’s just the determinant.
omega=det(dstack([a,b,c]))
But it is slower….
An other equivalent solution is omega=dot(a,cross(b,c)).sum(1)
.
But I think you have to compute about 9 (for cross) + 3 (for dot) + 2 (for sum) = 14 operations for each det, so it seems to be near optimal. At best you will win a two factor in numpy.
EDIT:
If speed is critical, you must go at low level. numba
is a easy way to do that for a 15X factor here :
from numba import njit
@njit
def multidet(a,b,c):
n=a.shape[0]
d=np.empty(n)
for i in range(n):
u,v,w=a[i],b[i],c[i]
d[i]=
u[0]*(v[1]*w[2]-v[2]*w[1])+
u[1]*(v[2]*w[0]-v[0]*w[2])+
u[2]*(v[0]*w[1]-v[1]*w[0]) # 14 operations / det
return d
some tests:
In [155]: %timeit multidet(a,b,c)
100000 loops, best of 3: 7.79 µs per loop
In [156]: %timeit numpy.einsum('ij, ij->i', a, numpy.cross(b, c))
10000 loops, best of 3: 114 µs per loop
In [159]: allclose(multidet(a,b,c),omega)
Out[159]: True
Here’s one approach making use of slicing
and summing –
def slicing_summing(a,b,c):
c0 = b[:,1]*c[:,2] - b[:,2]*c[:,1]
c1 = b[:,2]*c[:,0] - b[:,0]*c[:,2]
c2 = b[:,0]*c[:,1] - b[:,1]*c[:,0]
return a[:,0]*c0 + a[:,1]*c1 + a[:,2]*c2
We can replace the first three steps that compute c0, c1, c2
and its stacked version with a one-liner, like so –
b[:,[1,2,0]]*c[:,[2,0,1]] - b[:,[2,0,1]]*c[:,[1,2,0]]
This would create another (n,3)
array, which has to used with a
for sum-reduction resulting in a (n,)
shaped array. With the proposed slicing_summing
method, we are directly getting to that (n,)
shaped array with summing of those three slices and thus avoiding that intermediate (n,3)
array.
Sample run –
In [86]: # Setup inputs
...: n = 871
...: a = np.random.rand(n, 3)
...: b = np.random.rand(n, 3)
...: c = np.random.rand(n, 3)
...:
In [87]: # Original approach
...: omega = np.einsum('ij, ij->i', a, np.cross(b, c))
In [88]: np.allclose(omega, slicing_summing(a,b,c))
Out[88]: True
Runtime test –
In [90]: %timeit np.einsum('ij, ij->i', a, np.cross(b, c))
10000 loops, best of 3: 84.6 µs per loop
In [91]: %timeit slicing_summing(a,b,c)
1000 loops, best of 3: 63 µs per loop
I’ve done a comparison of the methods mentioned in the answers.
The results:
@Divakar’s beats the einsum-cross one by a bit.
For the sake of completeness, let me note that there is another method solely relying on dot-products and a sqrt, see here. That method is slightly slower than both einsum-cross and slice-sum.
The plot was created with perfplot,
import numpy as np
import perfplot
def einsum_cross(a, b, c):
return np.einsum("ij, ij->i", a, np.cross(b, c))
def det(a, b, c):
return np.linalg.det(np.dstack([a, b, c]))
def slice_sum(a, b, c):
c0 = b[:, 1] * c[:, 2] - b[:, 2] * c[:, 1]
c1 = b[:, 2] * c[:, 0] - b[:, 0] * c[:, 2]
c2 = b[:, 0] * c[:, 1] - b[:, 1] * c[:, 0]
return a[:, 0] * c0 + a[:, 1] * c1 + a[:, 2] * c2
b = perfplot.bench(
setup=lambda n: (
np.random.rand(n, 3),
np.random.rand(n, 3),
np.random.rand(n, 3),
),
n_range=[2**k for k in range(1, 20)],
kernels=[einsum_cross, det, slice_sum],
)
b.save("out.png")
b.show()