python: fast way to compute a matrix element (i, j) of which is distance between i-th and j-th points

Question:

I’d like to compute a matrix element (i, j) of which is distance between i-th and j-th points in python. A naive way is to do the following and it takes 0.43 sec to make a single matrix. Do you have any idea to speed up this code?

For me, it is ok to use widely-used packages such as scipy, scikit-learn.

import numpy as np
import time


def compute_distance_matrix(points: np.ndarray):
    assert points.ndim == 2
    n_point, n_dim = points.shape
    squared_dist_matrix = np.zeros((n_point, n_point))
    for i, p in enumerate(points):
        squared_dist_matrix[:, i] = np.sum((points - p) ** 2, axis=1)
    dist_matrix = np.sqrt(squared_dist_matrix)
    return dist_matrix


a = np.random.randn(1000, 4)
ts = time.time()
for _ in range(10):
    compute_distance_matrix(a)
print("average time {} sec".format(time.time() - ts))
Asked By: orematasaburo

||

Answers:

You can use Scipy’s cdist, or sklearn’s pairwise_distances.

Both pretty fast, e.g.

from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cdist, pdist

a = np.random.randn(1000, 4)
D = cdist(a,a)

-or-

D = pairwise_distances(a)

Both about 10x faster than custom code. For me, cdist() was the fastest, but I am unaware of the implementation details and how different hardware can have an impact.

Answered By: Willem Hendriks