Find the values in second array, based on the condition of first array. What's the most efficient way?

Question:

Suppose we have two arrays with the same shape. Using the uniques element in arr_1, I want to find the corresponding values in arr_2. And build a dictionary as output.

(arr_1 is sorted. arr_2 is NOT sorted).

Here is an example.

arr_1 = np.array([1,1,1,2,2,3]) # the index array, sorted
arr_2 = np.array([16,11,12,13,14,15]) # find the values, not sorted

target_dict = {1:[16,11,12], 2:[13,14], 3:[15]}

My solution with dict comprehension:

I wrote the following code:

target_dict = {i: arr_2[np.where(arr_1 == i)].tolist() for i in np.unique(arr_1)}

However, both arr_1 and arr_2 have more than 4B elements in my case. Hence, the code above can take more than 100 hours to finish.

May I ask is there any more efficient way to accomplish it? Thank you so much in advance!

Asked By: Nick Nick Nick

||

Answers:

Not sure why you’re using a where clause. You can do a single pass over both lists simultaneously to map the elements. Using a defaultdict here for simplicity.

from collections import defaultdict

arr_1 = [1,1,1,2,2,3]
arr_2 = [16,11,12,13,14,15]
target = defaultdict(list)
for ind, element in enumerate(arr_1):
    target[element].append(arr_2[ind])

print(dict(target))
# prints {1: [16, 11, 12], 2: [13, 14], 3: [15]}

Since this approach has linear complexity, it should be efficient enough to handle your use case.

Answered By: Abhinav Mathur
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.