How to modify numpy array with arbitrary indices in vectorized way?
Question:
Simplified story
Suppose I have an array arr
and indices idx
.
For each i
occuring in idx
, I want to increase arr[i]
by one.
A non-vectorized approch will be like this:
import numpy as np
arr = np.zeros(5)
idx = [0, 1, 1, 2, 0]
for i in idx:
arr[i] += 1
Is there any way to vectorize this?
Note that arr[idx] += 1
is invalid because of the duplicate indices.
arr = np.zeros(1)
idx = [0, 0]
arr[idx] += 1 # arr becomes array([1]), not array([2])
Of course, using np.unique()
can achieve the same goal in this 1D array example. But actually I am trying to deal with 2D array and I doubt that counting the elements would be the best solution.
Edit
np.unique
indeed works, but it seems there is unnecessary slowdown. I would like a faster approach (if exists).
Here is an example of 2D indices for 10,000 points without duplicate.
arr = np.zeros((10000, 10000))
idx = np.stack([np.arange(10000), np.arange(10000)])
%timeit np.unique(idx, axis=1, return_counts=True) # takes 1.93 ms
%timeit arr[idx[0], idx[1]] += 1 # takes 235 μs
Apparently, iterating by indexing is ~10 times faster.
Edit2
@PaulS’s answer was faster than np.unique
.
%timeit np.add.at(arr, (idx[0], idx[1]), 1) # takes 925 μs
Edit3
Here is the example with random index to test duplicate indices.
arr = np.zeros((10000, 10000))
ran = (np.random.rand(10000)*10).astype(int)
idx = np.stack([ran, ran])
%timeit np.unique(idx, axis=1, return_counts=True) # takes 3.24 ms
%timeit np.add.at(arr, (idx[0], idx[1]), 1) # takes 859 μs
(edit: typo)
Detailed story
I am trying to implement a Hough line transformation algorithm using NumPy. (The reason why I’m not using cv2.HoughLines()
is because I want the result directly from the coordinates of the points, not from binary array).
Getting the curves in (r, θ)
plane was easy, but I am having trouble implementing the accumulator in vectorized way. Currently I am relying on flattening the 2D data into 1D. Is there a nicer and faster way to perform accumulation?
Thank you in advance!
Answers:
Use numpy.unique
to get the unique indices and their count:
idx2, cnt = np.unique(idx, return_counts=True)
arr[idx2] += cnt
Updates arr
:
array([2, 2, 1, 0, 0])
with nd-arrays (example in 2D):
arr = np.zeros([3, 4], dtype=int)
idx = [[0, 0, 2, 0],
[1, 1, 3, 1]]
idx2, cnt = np.unique(idx, axis=1, return_counts=True)
arr[*idx2] = cnt
Output:
array([[0, 3, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1]])
If the indices are transposed:
arr = np.zeros([3, 4], dtype=int)
idx = idx = [[0, 1], [0, 1], [2, 3], [0, 1]]
idx2, cnt = np.unique(idx, axis=0, return_counts=True)
arr[*idx2.T] = cnt
1D arrays
Another possible solution:
np.add.at(arr, idx, 1)
Output:
[2. 2. 1. 0. 0.]
2D arrays
(Thanks, @mozway, for your example, that I am now using here.)
arr = np.zeros([3, 4], dtype=int)
idx = [[0, 0, 2, 0],
[1, 1, 3, 1]]
np.add.at(arr, (idx[0], idx[1]), 1)
Output:
array([[0, 3, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1]])
Simplified story
Suppose I have an array arr
and indices idx
.
For each i
occuring in idx
, I want to increase arr[i]
by one.
A non-vectorized approch will be like this:
import numpy as np
arr = np.zeros(5)
idx = [0, 1, 1, 2, 0]
for i in idx:
arr[i] += 1
Is there any way to vectorize this?
Note that arr[idx] += 1
is invalid because of the duplicate indices.
arr = np.zeros(1)
idx = [0, 0]
arr[idx] += 1 # arr becomes array([1]), not array([2])
Of course, using np.unique()
can achieve the same goal in this 1D array example. But actually I am trying to deal with 2D array and I doubt that counting the elements would be the best solution.
Edit
np.unique
indeed works, but it seems there is unnecessary slowdown. I would like a faster approach (if exists).
Here is an example of 2D indices for 10,000 points without duplicate.
arr = np.zeros((10000, 10000))
idx = np.stack([np.arange(10000), np.arange(10000)])
%timeit np.unique(idx, axis=1, return_counts=True) # takes 1.93 ms
%timeit arr[idx[0], idx[1]] += 1 # takes 235 μs
Apparently, iterating by indexing is ~10 times faster.
Edit2
@PaulS’s answer was faster than np.unique
.
%timeit np.add.at(arr, (idx[0], idx[1]), 1) # takes 925 μs
Edit3
Here is the example with random index to test duplicate indices.
arr = np.zeros((10000, 10000))
ran = (np.random.rand(10000)*10).astype(int)
idx = np.stack([ran, ran])
%timeit np.unique(idx, axis=1, return_counts=True) # takes 3.24 ms
%timeit np.add.at(arr, (idx[0], idx[1]), 1) # takes 859 μs
(edit: typo)
Detailed story
I am trying to implement a Hough line transformation algorithm using NumPy. (The reason why I’m not using cv2.HoughLines()
is because I want the result directly from the coordinates of the points, not from binary array).
Getting the curves in (r, θ)
plane was easy, but I am having trouble implementing the accumulator in vectorized way. Currently I am relying on flattening the 2D data into 1D. Is there a nicer and faster way to perform accumulation?
Thank you in advance!
Use numpy.unique
to get the unique indices and their count:
idx2, cnt = np.unique(idx, return_counts=True)
arr[idx2] += cnt
Updates arr
:
array([2, 2, 1, 0, 0])
with nd-arrays (example in 2D):
arr = np.zeros([3, 4], dtype=int)
idx = [[0, 0, 2, 0],
[1, 1, 3, 1]]
idx2, cnt = np.unique(idx, axis=1, return_counts=True)
arr[*idx2] = cnt
Output:
array([[0, 3, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1]])
If the indices are transposed:
arr = np.zeros([3, 4], dtype=int)
idx = idx = [[0, 1], [0, 1], [2, 3], [0, 1]]
idx2, cnt = np.unique(idx, axis=0, return_counts=True)
arr[*idx2.T] = cnt
1D arrays
Another possible solution:
np.add.at(arr, idx, 1)
Output:
[2. 2. 1. 0. 0.]
2D arrays
(Thanks, @mozway, for your example, that I am now using here.)
arr = np.zeros([3, 4], dtype=int)
idx = [[0, 0, 2, 0],
[1, 1, 3, 1]]
np.add.at(arr, (idx[0], idx[1]), 1)
Output:
array([[0, 3, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1]])