Difference between nonzero(a), where(a) and argwhere(a). When to use which?
Question:
In Numpy, nonzero(a)
, where(a)
and argwhere(a)
, with a
being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?
-
On argwhere
the documentation says:
np.argwhere(a)
is the same as np.transpose(np.nonzero(a))
.
Why have a whole function that just transposes the output of nonzero
? When would that be so useful that it deserves a separate function?
-
What about the difference between where(a)
and nonzero(a)
? Wouldn’t they return the exact same result?
Answers:
I can’t comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where
vs nonzero
. In it’s simplest use case, where
is indeed the same as nonzero
.
>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
or
>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)
where
is different from nonzero
in the case when you wish to pick elements of from array a
if some condition is True
and from array b
when that condition is False
.
>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])
Again, I can’t explain why they added the nonzero
functionality to where
, but this at least explains how the two are different.
EDIT: Fixed the first example… my logic was incorrect previously
nonzero
and argwhere
both give you information about where in the array the elements are True
. where
works the same as nonzero
in the form you have posted, but it has a second form:
np.where(mask,a,b)
which can be roughly thought of as a numpy “ufunc” version of the conditional expression:
a[i] if mask[i] else b[i]
(with appropriate broadcasting of a
and b
).
As far as having both nonzero
and argwhere
, they’re conceptually different. nonzero
is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0’s are sparse:
mask = a == 0 # entire array of bools
mask = np.nonzero(a)
Now you can use that mask to index other arrays, etc. However, as it is, it’s not very nice conceptually to figure out which indices correspond to 0 elements. That’s where argwhere
comes in.
In Numpy, nonzero(a)
, where(a)
and argwhere(a)
, with a
being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?
-
On
argwhere
the documentation says:np.argwhere(a)
is the same asnp.transpose(np.nonzero(a))
.Why have a whole function that just transposes the output of
nonzero
? When would that be so useful that it deserves a separate function? -
What about the difference between
where(a)
andnonzero(a)
? Wouldn’t they return the exact same result?
I can’t comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where
vs nonzero
. In it’s simplest use case, where
is indeed the same as nonzero
.
>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
or
>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)
where
is different from nonzero
in the case when you wish to pick elements of from array a
if some condition is True
and from array b
when that condition is False
.
>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])
Again, I can’t explain why they added the nonzero
functionality to where
, but this at least explains how the two are different.
EDIT: Fixed the first example… my logic was incorrect previously
nonzero
and argwhere
both give you information about where in the array the elements are True
. where
works the same as nonzero
in the form you have posted, but it has a second form:
np.where(mask,a,b)
which can be roughly thought of as a numpy “ufunc” version of the conditional expression:
a[i] if mask[i] else b[i]
(with appropriate broadcasting of a
and b
).
As far as having both nonzero
and argwhere
, they’re conceptually different. nonzero
is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0’s are sparse:
mask = a == 0 # entire array of bools
mask = np.nonzero(a)
Now you can use that mask to index other arrays, etc. However, as it is, it’s not very nice conceptually to figure out which indices correspond to 0 elements. That’s where argwhere
comes in.