Check if each element in a numpy array is in another array

Question:

This problem seems easy but I cannot quite get a nice-looking solution. I have two numpy arrays (A and B), and I want to get the indices of A where the elements of A are in B and also get the indices of A where the elements are not in B.

So, if

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])

Currently I am using

C = np.searchsorted(A,B)

which takes advantage of the fact that A is in order, and gives me [1, 3, 5], the indices of the elements that are in A. This is great, but how do I get D = [0,2,4,6], the indices of elements of A that are not in B?

Asked By: DanHickstein

||

Answers:

import numpy as np

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])
C = np.searchsorted(A, B)

D = np.delete(np.arange(np.alen(A)), C)

D
#array([0, 2, 4, 6])
Answered By: askewchan
import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7])
b = np.array([2, 4, 6])
c = np.searchsorted(a, b)
d = np.searchsorted(a, np.setdiff1d(a, b))

d
#array([0, 2, 4, 6])
Answered By: alexhb

searchsorted may give you wrong answer if not every element of B is in A. You can use numpy.in1d:

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6,8])
mask = np.in1d(A, B)
print np.where(mask)[0]
print np.where(~mask)[0]

output is:

[1 3 5]
[0 2 4 6]

However in1d() uses sort, which is slow for large datasets. You can use pandas if your dataset is large:

import pandas as pd
np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

Here is the time comparison:

A = np.random.randint(0, 1000, 10000)
B = np.random.randint(0, 1000, 10000)

%timeit np.where(np.in1d(A, B))[0]
%timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

output:

100 loops, best of 3: 2.09 ms per loop
1000 loops, best of 3: 594 µs per loop
Answered By: HYRY

The elements of A that are also in B:

set(A) & set(B)

The elements of A that are not in B:

set(A) – set(B)

Answered By: Ben Zweig
all_vals = np.arange(1000)  # `A` in the question
seen_vals = np.unique(np.random.randint(0, 1000, 100))  # `B` in the question
# indices of unseen values
mask = np.isin(all_vals, seen_vals, invert=True)  # `D` in the original question
unseen_vals = all_vals[mask]
Answered By: crypdick
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.