Docs about boolean scalars in indexing of numpy array

Question:

The NumPy’s array indexing documentation seems to contain no mention of array indexing Of the type x[True], or x[False].

Empirically, using True inserts a new dimension of size 1, while using False inserts a new dimension of size 0.

The behavior of x[True, True] only inserts one new axis of size 1 instead of two.

This behavior is not consistent with boolean indexing, and it’s not consistent with treating boolean scalars as integers.

Looking for an explanation of the observed behavior, and hopefully a rational. Thanks much!

Asked By: Sasha

||

Answers:

On the main indexing documentation page there’s a section for

https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing

As described there the indexing array should match the shape of the indexed array, or one of its axes.

I don’t think indexing with boolean scalars is documented, or at least I’m not aware of such a page. I suspect the behavior you see is the result of historical feature(s) that hasn’t been cleaned up.

In [137]: x=np.arange(4)

Proper indexing with a ‘mask’ that matches in shape:

In [138]: x[np.array([True,False,False,True])]
Out[138]: array([0, 3])

An incomplete mask raises an error:

In [139]: x[np.array([True,False,False])]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[139], line 1
----> 1 x[np.array([True,False,False])]
        x = array([0, 1, 2, 3])
        np = <module 'numpy' from 'C:\Users\paul\miniconda3\lib\site-packages\numpy\__init__.py'>

IndexError: boolean index did not match indexed array along dimension 0; dimension is 4 but corresponding boolean dimension is 3

I think this used to return a result with a warning.

What you observed is that a boolean scalar adds a dimension, with size 1 or 0:

In [140]: x[True].shape
Out[140]: (1, 4)
In [141]: x[False].shape
Out[141]: (0, 4)
In [142]: x[:,False].shape
Out[142]: (4, 0)

It apparently ignores multiple scalars.

I was going to say that [True] looks like [np.newaxis] (aka [None]), but the resulting strides is more like what a reshape produces.

I think this is a legacy behavior, and not something that you should count on using.

The duplicate quotes from a note on the indexing docs:

the nonzero equivalence for Boolean arrays does not hold for zero dimensional boolean arrays.

Answered By: hpaulj
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.