Unexpected results from Numpy r_

Question:

When I use ":n" or "m:" as arguments to np.r_, I get unexpected results that I don’t understand.

Here’s my code

import numpy as np  
B = np.arange(180).reshape(6,30)
C = B[:, np.r_[10:15, 20:26]]
D = C[:, np.r_[0:3,8:11]]

Now all of that worked as expected. C prints as:

array([[ 10,  11,  12,  13,  14,  20,  21,  22,  23,  24,  25],
       [ 40,  41,  42,  43,  44,  50,  51,  52,  53,  54,  55],
       [ 70,  71,  72,  73,  74,  80,  81,  82,  83,  84,  85],
       [100, 101, 102, 103, 104, 110, 111, 112, 113, 114, 115],
       [130, 131, 132, 133, 134, 140, 141, 142, 143, 144, 145],
       [160, 161, 162, 163, 164, 170, 171, 172, 173, 174, 175]])

and D is:

array([[ 10,  11,  12,  23,  24,  25],
       [ 40,  41,  42,  53,  54,  55],
       [ 70,  71,  72,  83,  84,  85],
       [100, 101, 102, 113, 114, 115],
       [130, 131, 132, 143, 144, 145],
       [160, 161, 162, 173, 174, 175]])

However, when I remove the "0" and the "11," I don’t understand what happens, and I haven’t been able to find any explanation in any Numpy indexing or r_ documentation. Here’s the new line of code:

E = C[:, np.r_[:3, 8:]]

It’s just the same expression that defined the D array with "unnecessary" indices removed. However, the results are mystifying:

array([[ 10,  11,  12,  10,  11,  12,  13,  14,  20,  21,  22],
       [ 40,  41,  42,  40,  41,  42,  43,  44,  50,  51,  52],
       [ 70,  71,  72,  70,  71,  72,  73,  74,  80,  81,  82],
       [100, 101, 102, 100, 101, 102, 103, 104, 110, 111, 112],
       [130, 131, 132, 130, 131, 132, 133, 134, 140, 141, 142],
       [160, 161, 162, 160, 161, 162, 163, 164, 170, 171, 172]])

I expected E to be identical to D, with just six columns. What’s going on? Is this behavior documented somewhere, or is this a bug?

Asked By: user2983936

||

Answers:

The answer is that Numpy.r_ indexing does not work like Python indexing. For some reason, it is different, and one has to know what the last index is to get the items from n to last and use <ndarray>.r_[n:last] instead of <ndarray>.r_[n:]. IMHO, this defeats one of the better features of Python, not having to call some sort of shape or size function to get your indices correct.

Answered By: user2983936

To understand the difference between D and E we have to look what the np.r_ produces. As with function calls, the ‘contents’ of an indexing, if complex, are evaluated first.

In [112]: D = C[:, np.r_[0:3,8:11]]; D.shape
Out[112]: (6, 6)
In [113]: E = C[:, np.r_[:3,8:]]; E.shape
Out[113]: (6, 11)

The two r_:

In [115]: np.r_[0:3,8:11]
Out[115]: array([ 0,  1,  2,  8,  9, 10])    
In [116]: np.r_[:3,8:]
Out[116]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])

r_ is an instance of a class defined in np.lib.index_tricks. That class has its own __getitem__ method, allowing us to use indexing notation, but the task is actually a call to np.concatenate.

We can see what r_ get by using another index_tricks:

In [117]: np.s_[0:3, 8:11]
Out[117]: (slice(0, 3, None), slice(8, 11, None))    
In [118]: np.s_[:3, 8:]
Out[118]: (slice(None, 3, None), slice(8, None, None))

If we define a simple function:

def foo(aslice):
    return np.arange(aslice.start, aslice.stop, aslice.step)

we can test the different slices:

In [124]: foo(np.s_[8:11])            # np.arange(8,11)
Out[124]: array([ 8,  9, 10])

In [125]: foo(np.s_[8:])              # np.arange(8)
Out[125]: array([0, 1, 2, 3, 4, 5, 6, 7])

Remember, that when we give arange just one number, it’s understood to be the ‘stop’, with an implicit 0 start. That’s the same as with python’s base range.

np.r_ actually uses:

In [105]: def foo1(item):
     ...:     step = item.step
     ...:     start = item.start
     ...:     stop = item.stop
     ...:     if start is None:
     ...:         start = 0
     ...:     if step is None:
     ...:         step = 1
     ...:     return np.arange(start, stop, step)

but this just lets us use np.r_[:3] instead of np.r_[0:3]. It doesn’t change the [8:] case.

In case it isn’t clear. A[i,j] is translated by the interpreter into A.__getitem__((i,j)), a function call. The interpreter also converts any ‘::’ into a slice(...) object, as illustrated by s_.

After converting the slices into arrays with np.arange or np.linspace (for ‘complex’ steps), it does a concatenate

So your two r_ expressions are really:

In [128]: np.concatenate([np.arange(0,3), np.arange(8,11)])    # [115]
Out[128]: array([ 0,  1,  2,  8,  9, 10])

In [129]: np.concatenate([np.arange(0,3), np.arange(8,None)])   # [116]
Out[129]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])

I suppose one could argue that np.r_[8:] should raise an error, since it provides a start without stop, and thus can’t be evaluated as it would in a real indexing case. As coded it works because of the default behavior of np.arange.

edit

When I use ‘8:’ directly, C can deduce the correct stop from its own shape:

In [140]: C.shape
Out[140]: (6, 11)

In [141]: C[:,8:].shape
Out[141]: (6, 3)

But an np.r_ object does not have a shape, nor can it deduce the shape from C:

In [142]: np.r_.shape
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [142], in <cell line: 1>()
----> 1 np.r_.shape

AttributeError: 'RClass' object has no attribute 'shape'

If you want to avoid the explicit 11, you have use:

In [143]: C[:, np.r_[8:C.shape[1]]].shape
Out[143]: (6, 3)
Answered By: hpaulj