Find indices of rows of numpy 2d array with float data in another 2D array

Question

This post helped to achieve what I wanted but the implementation takes longer for some large datasets I work onNumPyhave two NumPy arrays (fairly large):

p[:24]=array([[ 0.18264738, -0.00326727,  0.01799096],
   [ 0.18198644, -0.00051316,  0.01800063],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18215604,  0.00157497,  0.01799999],
   [ 0.18286349,  0.0036474 ,  0.01799824],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18399446,  0.00528562,  0.01799998],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18956973,  0.00801727,  0.01800126],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.19157426,  0.0078435 ,  0.018     ],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.19701494,  0.00384344,  0.01800058],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.18999948,  0.        ,  0.0226188 ]]

v[:24]=array([[ 0.18264738, -0.00326727,  0.01799096],
   [ 0.18198644, -0.00051316,  0.01800063],
   [ 0.18999948,  0.        ,  0.0226188 ],
   [ 0.18215604,  0.00157497,  0.01799999],
   [ 0.18286349,  0.0036474 ,  0.01799824],
   [ 0.18399446,  0.00528562,  0.01799998],
   [ 0.18573835,  0.0068323 ,  0.01799908],
   [ 0.18744153,  0.00758001,  0.018     ],
   [ 0.18956973,  0.00801727,  0.01800126],
   [ 0.19157426,  0.0078435 ,  0.018     ],
   [ 0.19366005,  0.00714792,  0.01800038],
   [ 0.19584496,  0.0055142 ,  0.01799665],
   [ 0.19701494,  0.00384344,  0.01800058],
   [ 0.19775054,  0.0019907 ,  0.01800372],
   [ 0.19800517, -0.00065405,  0.01800135],
   [ 0.19731225, -0.00330035,  0.01799999],
   [ 0.19596213, -0.00537427,  0.01800001],
   [ 0.18937038, -0.00797523,  0.018     ],
   [ 0.18739267, -0.00759293,  0.01799974],
   [ 0.18565072, -0.00671446,  0.018     ],
   [ 0.18411626, -0.00545196,  0.01800367],
   [ 0.19136006, -0.00791202,  0.01799961],
   [ 0.1938769 , -0.00702934,  0.01799973],
   [ 0.1314003 , -0.06724723,  0.0645    ]])

v array is generated from p array using:

p_uniques, p_indices, p_inverse, p_counts = np.unique(
                                              p, return_index=True, 
                                              return_inverse=True, 
                                              return_counts=True, 
                                              axis=0)

v = p[np.sort(p_indices, axis=None)]

Now, the target is to generate an array containing the indices/occurrences of elements of the v array in the p array including duplicates. Therefore, the desired output would be:

indices[:24]=array([ 0,  1,  2,  3,  4,  2,  5,  6,  2,  6,  7,  2,  
                     7,  8,  2,  9, 10, 2,  2, 11, 12, 10, 11,  2])

I just posted the first 24 indices from the indices array to save space.

I tried various methods using np.where, np.isin, and others but I could not achieve the desired result with better performance over the solution shared in the linked post.

I’d greatly appreciate your help.

Asked By: Ravi

||

Source

Answer 1

The key insight here is that v is a permutation of p_uniques and np.argsort(p_indices) provides this permutation. Inverting this permutation gives us the mapping that we have to apply to p_inverse to get what we want.

To invert the permutation, we use the code from How to invert a permutation array in numpy

# p_indices: len(v), range(0, len(p)). Maps v indices to p indices
# p_inverse: len(p), range(0, len(v)). Maps p indices to p_unique indices
p_uniques, p_indices, p_inverse = np.unique(
      p, return_index=True, return_inverse=True, axis=0)

# len(v), range(0, len(v)). Maps v indices to p_unique indices
sort_permut = np.argsort(p_indices)
v = p_uniques[sort_permut]

# len(v), range(0, len(v)). Maps p_unique indices to v indices
inv_sort = np.empty_like(sort_permut)
inv_sort[sort_permut] = np.arange(len(inv_sort))

# len(p), range(0, len(v)). Maps p indices to v indices
indices = inv_sort[p_inverse]

Answered By: Homer512

Find indices of rows of numpy 2d array with float data in another 2D array

Question:

Answers: