Filter nested numpy array

Question:

Hi I’m trying to filter the following numpy array but running into issues. I’d like to filter for all arrays equal to [(‘a’,’b’,’c’),1]. The problem is I can’t figure out how to either combine the elements in each array so instead of [(‘a’,’b’,’c’),1], I would have [(‘a’,’b’,’c’,1)], OR simply filter given the original structure. I tried a combination of np.concatenate() and np.ravel() but result wasn’t expected.

a = np.array([[('a','b','c'), 1], [('b','c','a'), 1], [('a','b','c'), 2], [('a','b','c'), 1]])

Method:
Filter if 1st element = 'a', 2nd element = 'b', 3rd element ='c' and 4th element = 1

Desired Output:
output = np.array([[('a','b','c'), 1], [('a','b','c'), 1]])

EDIT: I was able to get this to work with a pandas solution, but only by converting it to a dataframe, which is too expensive thus why I’m trying to acheive a more optimized solution with numpy

Asked By: chicagobeast12

||

Answers:

You could do:

a[(a == [('a', 'b', 'c'), 1]).all(1)]

Output:

array([[('a', 'b', 'c'), 1],
       [('a', 'b', 'c'), 1]], dtype=object)

Output of a == [('a', 'b', 'c'), 1] :

array([[ True,  True],
       [False,  True],
       [ True, False],
       [ True,  True]])

Different comparators for different elements:

N.B. You said 4th element in your comment, but its not really 4th element its second element with the way you constructed your numpy array.

Now if you are looking to have a condition like <= for the second element (integers) and == for the tuple elements (like (‘a’, ‘b’, ‘c’) ) there is no easy way in short because of the way array is constructed.

But a work around is to create an auxiliary array like:

abc = np.array([[('a','b','c'), 0]], dtype='O').repeat(len(a), axis=0)

and then use it to filter with different comparator for different columns like:

a[(a[:, 0] == abc[:, 0]) & (a[:, 1] <= 2)]
Answered By: SomeDude

In your case a more faster way is to compare 1st and 2nd item explicitly:

[r for r in a if r[0] == ('a','b','c') and r[1] == 1]

In [400]: %timeit [r for r in a if r[0] == ('a','b','c') and r[1] == 1]
1.2 µs ± 6.69 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [402]: %timeit a[(a == [('a', 'b', 'c'), 1]).all(1)]
7.21 µs ± 85.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Answered By: RomanPerekhrest
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.