Fastest way to calculate the centroid of a set of coordinate tuples in python without numpy

Question:

I’ve been working on a project that is incredibly time sensitive (that unfortunately has to be in python) and one of the functions that is used extensively is a function that calculates the centroid of a list of (x, y) tuples. To illustrate:

def centroid(*points):
    x_coords = [p[0] for p in points]
    y_coords = [p[1] for p in points]
    _len = len(points)
    centroid_x = sum(x_coords)/_len
    centroid_y = sum(y_coords)/_len
    return [centroid_x, centroid_y]

where

>>> centroid((0, 0), (10, 0), (10, 10), (0, 10))
[5, 5]

This function runs fairly quickly, the above example completing in an average of 1.49e-05 seconds on my system but I’m looking for the fastest way to calculate the centroid. Do you have any ideas?

One of the other solutions I had was to do the following (where l is the list of tuples):

map(len(l).__rtruediv__, map(sum, zip(*l)))

Which runs in between 1.01e-05 and 9.6e-06 seconds, but unfortunately converting to a list (by surrounding the whole statement in list( ... )) nearly doubles computation time.

EDIT: Suggestions are welcome in pure python BUT NOT numpy.

EDIT2: Just found out that if a separate variable is kept for the length of the list of tuples, then my above implementation with map runs reliably under 9.2e-06 seconds, but there’s still the problem of converting back to a list.

EDIT3:

Now I’m only accepting answers in pure python, NOT in numpy (sorry to those that already answered in numpy!)

Asked By: user3002473

||

Answers:

This is a naive numpy implementation, I can’t time here so I wonder how it does:

import numpy as np

arr = np.asarray(points)
length = arr.shape[0]
sum_x = np.sum(arr[:, 0])
sum_y = np.sum(arr[:, 1])
return sum_x / length, sum_y / length

You pass the points to centroid() as separate parameters, that are then put into a single tuple with *points. It would be faster to just pass in a list or iterator with points.

Answered By: RemcoGerlich
import numpy as np

data = np.random.randint(0, 10, size=(100000, 2))

this here is fast

def centeroidnp(arr):
    length = arr.shape[0]
    sum_x = np.sum(arr[:, 0])
    sum_y = np.sum(arr[:, 1])
    return sum_x/length, sum_y/length

%timeit centeroidnp(data)
10000 loops, best of 3: 181 µs per loop

surprisingly, this is much slower:

%timeit data.mean(axis=0)
1000 loops, best of 3: 1.75 ms per loop

numpy seems very quick to me…

For completeness:

def centeroidpython(data):
    x, y = zip(*data)
    l = len(x)
    return sum(x) / l, sum(y) / l
#take the data conversion out to be fair!
data = list(tuple(i) for i in data)

%timeit centeroidpython(data)
10 loops, best of 3: 57 ms per loop
Answered By: Retozi

Just for completeness, I modified Retozi’s function so it accepts a vector of any dimension:

def centeroidnp(arr):
    length, dim = arr.shape
    return np.array([np.sum(arr[:, i])/length for i in range(dim)])
Answered By: Pau

In Cartesian coordinates, the centroid is just the mean of the components:

data = ((0,0), (1,1), (2,2))
np.mean(data, axis=0)
>>> array([1., 1.])
Answered By: CasualScience
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.