Delete all numbers from an array which are in another array

Question:

I have an array "removable" containing a few numbers from another array "All" containing all numbers from 0 to k.

I want to remove all numbers in A which are listed in removable.

All = np.arange(k)
removable = np.ndarray([1, 3, 4 , 7, 9, ..., 200])

for i in removable:
    if i in All:
        All.remove(i)

ndarray has no remove attribute, but I’m sure there is an easy method in numpy to solve this problem, but I can’t find it in the documentation.

Asked By: Tim4497

||

Answers:

You could use the function setdiff1d from NumPy:

>>> a = np.array([1, 2, 3, 2, 4, 1])
>>> b = np.array([3, 4, 5, 6])
>>> np.setdiff1d(a, b)
array([1, 2])
Answered By: f.wue

numpy arrays have a fixed shape, you cannot remove elements from them.

You cannot do this with ndarrays.

Answered By: Mihai Andrei

You should do this with sets instead of lists/arrays, which is easy enough:

remaining = np.array(set(arr).difference(removable))

where arr is your All array above (“all” is a keyword and should not be overwritten).

Granted, using sets will get rid of repeated elements if you have those in your arr, but it sounds like arr is just a sequence of unique values. Sets have much more efficient membership checking (constant-time vs. order N), so you get to go a lot faster. By comparison, I made a list version that builds a list if a value is not in removable:

def remove_list(arr, rem):
    result = []
    for i in arr:
        if i not in rem:
            result.append(i)
    return result

and made my set version a function as well:

def remove_set(arr, rem):
    return np.array(set(arr).difference(rem))

Timing comparison with arr = np.arange(10000) and removable = np.random.randint(0, 10000, 1000):

remove_list(arr, removable)
# 55.5 ms ± 664 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

remove_set(arr, removable)
# 947 µs ± 3.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Set is 50 times faster.

Answered By: Engineero

Solution – fast for big arrays, no need to transform into list (slowing down computation)

orig=np.arange(15)
to_remove=np.array([1,2,3,4])
mask = np.isin(orig, to_remove)
orig=orig[np.invert(mask)]

>>> orig
array([ 0,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
Answered By: Martin

np.setdiff1d() will de-duplicate the original entries, and will also return the result sorted.

That’s fine in some cases, but if you want to avoid one or both of these aspects, have a look at np.in1d() with an (inverted) boolean mask:

>>> a = np.array([1, 2, 3, 2, 4, 1])                                                                                                                                                                                                                    
>>> b = np.array([3, 4, 5, 6])                                                                                                                                                                                                                          
>>> a[~np.in1d(a, b)]                                                                                                                                                                                                                                   
array([1, 2, 2, 1])

The ~ operator does inversion on the boolean mask:

>>> np.in1d(a, b)                                                                                                                                                                                                                                       
array([False, False,  True, False,  True, False])

>>> ~np.in1d(a, b)                                                                                                                                                                                                                                      
array([ True,  True, False,  True, False,  True])

Disclaimer:

Note that this is not truly removal, as you indicated in your question; the result is a view into filtered elements of the original array a. Same goes for np.delete(); there’s no concept of in-place element deletion for NumPy arrays.

Answered By: Brad Solomon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.