Numpy array loses shape after applying mask across axis

Question:

Problem

I have np.array and mask which are of the same shape. Once I apply the mask, the array loses it shape and becomes 1D – flattened one dimensional.

Question

I am wanting to reduce my array across some axis, based on a mask of axis length 1D.

How can I apply a mask, but keep dimensionality of the array?

Example

A small example in code:

# data ...
>>> data = np.ones((4, 4))
>>> data.shape
(4, 4)

# mask ...
>>> mask = np.ones((4, 4), dtype=bool)
>>> mask.shape
(4, 4)

# apply mask ...
>>> data[mask].shape
(16,)

My ideal shape would be (4, 4).

An example with array dimension reduction across an axis:

# data, mask ...
>>> data = np.ones((4, 4))
>>> mask = np.ones((4, 4), dtype=bool)

# remove last column from data ...
>>> mask[:, 3] = False 
>>> mask
array([[ True,  True,  True, False],
       [ True,  True,  True, False],
       [ True,  True,  True, False],
       [ True,  True,  True, False]])

# equivalent mask in 1D ...
>>> mask[0]
array([ True,  True,  True, False])

# apply mask ...
>>> data[mask].shape 
(12,)

The ideal dimensions of the array would be (4, 3) without reshape.

Help is appreciated, thanks!

Asked By: Max Collier

||

Answers:

I believe what you want can be done by calling new_data.reshape(837, -1). Here’s a brief example:

arr = np.arange(8*6).reshape(8,6)
maskpiece = np.array([True, False]*3)
mask = np.broadcast_to(maskpiece, (8,6))

print('the original arrayn%sn' % arr)
print('the flat masked arrayn%sn' % arr[mask])
print('the masked array reshaped into 2Dn%sn' % arr[mask].reshape(8, -1))

Output:

the original array
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]
 [36 37 38 39 40 41]
 [42 43 44 45 46 47]]

the flat masked array
[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46]

the masked array reshaped into 2D
[[ 0  2  4]
 [ 6  8 10]
 [12 14 16]
 [18 20 22]
 [24 26 28]
 [30 32 34]
 [36 38 40]
 [42 44 46]]
Answered By: tel

The ‘correct’ way of achieving your goal is to not expand the mask to 2D. Instead index with [:, mask] with the 1D mask. This indicates to numpy that you want axis 0 unchanged and mask applied along axis 1.

a = np.arange(12).reshape(3, 4)
b = np.array((1,0,1,0),'?')
a
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11]])
b
# array([ True, False,  True, False])
a[:, b]
# array([[ 0,  2],
#        [ 4,  6],
#        [ 8, 10]])

If your mask is already 2D, numpy won’t check whether all its rows are the same because that would be inefficient. But obviously you can use [:, mask[0]] in that case.

If your mask is 2D and just happens to have the same number of Trues in each row then either use @tel’s answer. Or create an index array:

B = b^b[:3, None]
B
# array([[False,  True, False,  True],
#        [ True, False,  True, False],
#        [False,  True, False,  True]])
J = np.where(B)[1].reshape(len(B), -1)

And now either

np.take_along_axis(a, J, 1)
# array([[ 1,  3],
#        [ 4,  6],
#        [ 9, 11]])

or

I = np.arange(len(J))[:, None]
IJ = I, J
a[IJ]
# #array([[ 1,  3],
#         [ 4,  6],
#         [ 9, 11]])
Answered By: Paul Panzer