How to modify numpy array with arbitrary indices in vectorized way?

Question:

Simplified story

Suppose I have an array arr and indices idx.
For each i occuring in idx, I want to increase arr[i] by one.

A non-vectorized approch will be like this:

import numpy as np

arr = np.zeros(5)
idx = [0, 1, 1, 2, 0]

for i in idx:
    arr[i] += 1

Is there any way to vectorize this?

Note that arr[idx] += 1 is invalid because of the duplicate indices.

arr = np.zeros(1)
idx = [0, 0]
arr[idx] += 1  # arr becomes array([1]), not array([2])

Of course, using np.unique() can achieve the same goal in this 1D array example. But actually I am trying to deal with 2D array and I doubt that counting the elements would be the best solution.

Edit

np.unique indeed works, but it seems there is unnecessary slowdown. I would like a faster approach (if exists).

Here is an example of 2D indices for 10,000 points without duplicate.

arr = np.zeros((10000, 10000))
idx = np.stack([np.arange(10000), np.arange(10000)])

%timeit np.unique(idx, axis=1, return_counts=True)  # takes 1.93 ms

%timeit arr[idx[0], idx[1]] += 1  # takes 235 μs

Apparently, iterating by indexing is ~10 times faster.

Edit2

@PaulS’s answer was faster than np.unique.

%timeit np.add.at(arr, (idx[0], idx[1]), 1) # takes 925 μs

Edit3

Here is the example with random index to test duplicate indices.

arr = np.zeros((10000, 10000))
ran = (np.random.rand(10000)*10).astype(int)
idx = np.stack([ran, ran])

%timeit np.unique(idx, axis=1, return_counts=True)  # takes 3.24 ms

%timeit np.add.at(arr, (idx[0], idx[1]), 1) # takes 859 μs

(edit: typo)

Detailed story

I am trying to implement a Hough line transformation algorithm using NumPy. (The reason why I’m not using cv2.HoughLines() is because I want the result directly from the coordinates of the points, not from binary array).

Getting the curves in (r, θ) plane was easy, but I am having trouble implementing the accumulator in vectorized way. Currently I am relying on flattening the 2D data into 1D. Is there a nicer and faster way to perform accumulation?

Thank you in advance!

Asked By: JS S

||

Answers:

Use numpy.unique to get the unique indices and their count:

idx2, cnt = np.unique(idx, return_counts=True)

arr[idx2] += cnt

Updates arr:

array([2, 2, 1, 0, 0])

with nd-arrays (example in 2D):

arr = np.zeros([3, 4], dtype=int)
idx = [[0, 0, 2, 0],
       [1, 1, 3, 1]]

idx2, cnt = np.unique(idx, axis=1, return_counts=True)
arr[*idx2] = cnt

Output:

array([[0, 3, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])

If the indices are transposed:

arr = np.zeros([3, 4], dtype=int)
idx = idx = [[0, 1], [0, 1], [2, 3], [0, 1]]

idx2, cnt = np.unique(idx, axis=0, return_counts=True)
arr[*idx2.T] = cnt
Answered By: mozway

1D arrays

Another possible solution:

np.add.at(arr, idx, 1)

Output:

[2. 2. 1. 0. 0.]

2D arrays

(Thanks, @mozway, for your example, that I am now using here.)

arr = np.zeros([3, 4], dtype=int)
idx = [[0, 0, 2, 0],
       [1, 1, 3, 1]]

np.add.at(arr, (idx[0], idx[1]), 1)

Output:

array([[0, 3, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])
Answered By: PaulS
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.