Find the values in second array, based on the condition of first array. What's the most efficient way?

Question

Suppose we have two arrays with the same shape. Using the uniques element in arr_1, I want to find the corresponding values in arr_2. And build a dictionary as output.

(arr_1 is sorted. arr_2 is NOT sorted).

Here is an example.

arr_1 = np.array([1,1,1,2,2,3]) # the index array, sorted
arr_2 = np.array([16,11,12,13,14,15]) # find the values, not sorted

target_dict = {1:[16,11,12], 2:[13,14], 3:[15]}

My solution with dict comprehension:

I wrote the following code:

target_dict = {i: arr_2[np.where(arr_1 == i)].tolist() for i in np.unique(arr_1)}

However, both arr_1 and arr_2 have more than 4B elements in my case. Hence, the code above can take more than 100 hours to finish.

May I ask is there any more efficient way to accomplish it? Thank you so much in advance!

Asked By: Nick Nick Nick

||

Source

Answer 1

Not sure why you’re using a where clause. You can do a single pass over both lists simultaneously to map the elements. Using a defaultdict here for simplicity.

from collections import defaultdict

arr_1 = [1,1,1,2,2,3]
arr_2 = [16,11,12,13,14,15]
target = defaultdict(list)
for ind, element in enumerate(arr_1):
    target[element].append(arr_2[ind])

print(dict(target))
# prints {1: [16, 11, 12], 2: [13, 14], 3: [15]}

Since this approach has linear complexity, it should be efficient enough to handle your use case.

Answered By: Abhinav Mathur

Find the values in second array, based on the condition of first array. What's the most efficient way?

Question:

Answers: