Find indices of unique common values inside two unsorted 2d arrays of unique numbers

Question:

In other words, I need to define the index correspondence between equal unique numbers inside two unsorted 2d arrays. Similar questions:

  1. how to find indices of a 2d numpy array occuring in another 2d array – not about single values, but rows/columns

  2. test for membership in a 2d numpy array – not about single values, but rows/columns

  3. Pythonic way of finding indexes of unique elements in two arrays, 1d sorted arrays

  4. Finding the indexed location of values in a unsorted numpy array from data in another unsorted numpy array is about 1d unsorted arrays

There are two 2d arrays with unique numbers, say: x = [[45, 67], [32, 52], [94, 64], [21, 90]], and y = [[67, 103, 12], [2, 61, 77], [70, 94, 18]]. The numbers 67, 94 are common for these two lists.

Is there a faster solution to get the index correspondence like: [[[0, 1], [0, 0]], [[2, 0], [2, 1]]] than the proposed bellow, if each array is of thousands elements?

Asked By: Yerbol Sapar

||

Answers:

1st, get the mask of shape x that shows common values in both x&y(y then is flattened, as described at numpy.isin) for unique values:

a = np.isin(x, y, assume_unique=True)
a
array([[False,  True],
       [False, False],
       [ True, False],
       [False, False]])

2nd, apply np.argwhere to the mask with term > 0, it returns the indices of True in the mask, that is, address of common values 67 & 94 inside array x:

np.argwhere(a > 0)    
array([[0, 1],
       [2, 0]]) 

3rd, points 1&2 above applied to array y returns address of the same common values 67 & 94, but inside array y:

b = np.isin(y, x, assume_unique=True) 
np.argwhere(b > 0)    
array([[0, 0],
       [2, 1]])

4th, use np.stack((np.argwhere(a > 0), np.argwhere(b > 0)), axis=1) for convenient reading:

array([[[0, 1],
        [0, 0]],
       [[2, 0],
        [2, 1]]])

which means, the 1st common element 67 is in x at index [0, 1] and in y at [0, 0]; the second 94 in x: [2, 0], in y: [2, 1].

5th, to see the common values in both arrays use numpy ‘fancy index’, converting x&y to numpy array beforehand:

xi = np.array(x)[a]    
xi
array([67, 94])    
yi = np.array(y)[b]
yi
array([67, 94])

Here is might be a problem, if the order of common values is not the same. E.g., in case y = [[94, 103, 12], [2, 61, 77], [70, 67, 18]], np.array(y)[np.isin(y, x, assume_unique=True)] will give:
yi = array([94, 67]) vs. xi = array([67, 94]). The use of np.stack((a, b), axis=1) makes sense only for mutually ordered indices of common values. Therefore, after the point 3 of the solution, we must do 5. (i.e., get the flat array of common values per list), and, by argsort() get the sorted index array in xi&yi. For the new y and old x that index arrays look like:

xi, yi = np.argsort(xi), np.argsort(yi)
yi
array([1, 0])
xi
array([0, 1])

And now, it is OK to use np.stack with ‘fancy index’:

np.stack((np.argwhere(a > 0)[xi], np.argwhere(b > 0)[yi]), axis=1)
array([[[0, 1],
        [2, 1]],
       [[2, 0],
        [0, 0]]])

If put together, the final proposed solution is:

def indx_correspnd(x, y):
    a = np.isin(x, y, assume_unique=True) 
    b = np.isin(y, x, assume_unique=True)
    xi = np.array(x)[a]
    yi = np.array(y)[b]
    xi, yi = np.argsort(xi), np.argsort(yi)
    return np.stack((np.argwhere(a > 0)[xi], np.argwhere(b > 0)[yi]), axis=1)

Use case1:

import numpy as np

x = [[45, 67], [32, 52], [94, 64], [21, 90]]
y = [[94, 103, 12], [2, 61, 77], [70, 67, 18]]

indx_correspnd(x, y)
array([[[0, 1],
        [2, 1]],
       [[2, 0],
        [0, 0]]])

Use case2, application to 2x2d lists: 4000 elements placed in 80 sublists by 50 & 4200 elements placed in 105 sublists by 40:

f=random.sample(range(1, 5000), 4000)
g=random.sample(range(1, 5000), 4200)
f=np.array(f).reshape(-1, 50)
g=np.array(g).reshape(-1, 40)

indx_correspnd(g, f)
array([[[52, 43],
    [11,  2]],
   [[38, 17],
    [29, 31]],
   [[74, 27],
    [45,  8]],
   ...,
   [[66, 38],
    [47,  7]],
   [[ 8,  3],
    [11,  6]],
   [[20, 39],
    [47, 26]]])
Answered By: Yerbol Sapar

We can use a dict to store values in the first 2d array, then go through the second 2d array to see if the dict contains the key.

x = [[45, 67], [32, 52], [94, 64], [21, 90]]
y = [[67, 103, 12], [2, 61, 77], [70, 94, 18]]

cor_indices = []

m1 = len(x)
n1 = len(x[0])

x_values = dict()

for i in range(m1):
    for j in range(n1):
        x_values[x[i][j]] = [i, j]

m2 = len(y)
n2 = len(y[0])
for i in range(m2):
    for j in range(n2):
        if y[i][j] in x_values:
            cor_indices.append([x_values[y[i][j]], [i, j]])

print(cor_indices)

Not a very pythonic way, but I think it’s more efficient, with the time complexity of O(m1 * n1 + m2 * n2) and space complexity as O(m1 * n1).

Answered By: Haoliang
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.