Finding indices of matches of one array in another array

Question:

I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B’s values within A.

For example:

A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9
Asked By: farhawa

||

Answers:

You can use np.in1d with np.nonzero

np.nonzero(np.in1d(A,B))[0]

You can also use np.searchsorted, if you care about maintaining the order –

np.searchsorted(A,B)

For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so –

sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]

I would add in my favorite broadcasting too in the mix to solve a generic case –

np.nonzero(B[:,None] == A)[1]

Sample run –

In [125]: A
Out[125]: array([ 7,  5,  1,  6, 10,  9,  8])

In [126]: B
Out[126]: array([ 1, 10,  7])

In [127]: sort_idx = A.argsort()

In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])

In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Answered By: Divakar

Have you tried searchsorted?

A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])

A.searchsorted(B)
# array([0, 6, 9])
Answered By: Bi Rico

Just for completeness: If the values in A are non negative and reasonably small:

lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices  = lookup[B]
Answered By: Paul Panzer

I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.

As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.

I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.

In [1]: import numpy as np
In [2]: from numba import njit

In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])

In [5]: np.where(np.in1d(a, b))[0]
   ...: array([0, 6, 9])

In [6]: np.nonzero(np.in1d(a, b))[0]
   ...: array([0, 6, 9])   

In [7]: np.searchsorted(a, b)
   ...: array([0, 6, 9])


In [8]: np.searchsorted(a, np.intersect1d(a, b))
   ...: array([0, 6, 9])

In [9]: [i for i, x in enumerate(a) if x in b]
   ...: [0, 6, 9]

In [10]: @njit
    ...: def func(a, b):
    ...:     return [i for i, x in enumerate(a) if x in b]

In [11]: func(a, b)
   ...: [0, 6, 9]

Now, let’s compare the timing performance of these solutions.

In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Answered By: Mohammadreza Riahi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.