What is the purpose of numpy.where returning a tuple?

Question:

When I run this code:

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
print(np.where(a > 2))

it would be natural to get an array of indices where a > 2, i.e. [2, 3, 4, 5], but instead we get:

(array([2, 3, 4, 5], dtype=int64),)

i.e. a tuple with empty second member.

Then, to get the the “natural” answer of numpy.where, we have to do:

np.where(a > 2)[0]

What’s the point in this tuple? In which situation is it useful?

Note: I’m speaking here only about the use case numpy.where(cond) and not numpy.where(cond, x, y) that also exists (see documentation).

Asked By: Basj

||

Answers:

numpy.where returns a tuple because each element of the tuple refers to a dimension.

Consider this example in 2 dimensions:

a = np.array([[1, 2, 3, 4, 5, 6],
              [-2, 1, 2, 3, 4, 5]])

print(np.where(a > 2))

(array([0, 0, 0, 0, 1, 1, 1], dtype=int64),
 array([2, 3, 4, 5, 3, 4, 5], dtype=int64))

As you can see, the first element of the tuple refers to the first dimension of relevant elements; the second element refers to the second dimension.

This is a convention numpy often uses. You will see it also when you ask for the shape of an array, i.e. the shape of a 1-dimensional array will return a tuple with 1 element:

a = np.array([[1, 2, 3, 4, 5, 6],
              [-2, 1, 2, 3, 4, 5]])

print(a.shape, a.ndim)  # (2, 6) 2

b = np.array([1, 2, 3, 4, 5, 6])

print(b.shape, b.ndim)  # (6,) 1
Answered By: jpp

For consistency: the length of the tuple matches the number of dimensions of the input array.

>>> np.where(np.ones((1)) > 0)
(array([0]),)
>>> np.where(np.ones((1,1)) > 0)
(array([0]), array([0]))
>>> np.where(np.ones((1,1,1)) > 0)
(array([0]), array([0]), array([0]))

Making the 1-d case return an array instead of a tuple would cause inhomogeneous return types. If the caller code is dealing with input data of arbitrary shape, then the programmer would have to special-case handling for 1-d inputs in the return value.

Answered By: wim

From the documentation of np.where

If only condition is given, return the tuple condition.nonzero(), the indices where condition is True

So we look into the documentation of ‘np.nonzero’

Returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension. The values in a are always tested and returned in row-major, C-style order. The corresponding non-zero values can be obtained with:

So how can this be useful for np.where/np.nonzero return a tuple of arrays? I think it is related to indexing multi-dimensional arrays.

From the example of the documentation if we have

y = np.arange(35).reshape(5,7)

We can do

y[np.array([0,2,4]), np.array([0,1,2])]

to select y[0, 0], y[2, 1], y[4, 2].

In this case, if the index arrays have a matching shape, and there is an index array for each dimension of the array being indexed, the resultant array has the same shape as the index arrays, and the values correspond to the index set for each position in the index arrays. In this example, the first index value is 0 for both index arrays, and thus the first value of the resultant array is y[0,0]. The next value is y[2,1], and the last is y[4,2].

Hope that indexing multi-dimensional arrays would justify that np.nonzero/np.where return a tuple of arrays such that it can be used to select elements later on.

Answered By: Tai
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.