in operator, float("NaN") and np.nan
Question:
I used to believe that in
operator in Python checks the presence of element in some collection using equality checking ==
, so element in some_list
is roughly equivalent to any(x == element for x in some_list)
. For example:
True in [1, 2, 3]
# True because True == 1
or
1 in [1., 2., 3.]
# also True because 1 == 1.
However, it is well-known that NaN
is not equal to itself. So I expected that float("NaN") in [float("NaN")]
is False
. And it is False
indeed.
However, if we use numpy.nan
instead of float("NaN")
, the situation is quite different:
import numpy as np
np.nan in [np.nan, 1, 2]
# True
But np.nan == np.nan
still gives False
!
How is it possible? What’s the difference between np.nan
and float("NaN")
? How does in
deal with np.nan
?
Answers:
To check if the item is in the list, Python tests for object identity first, and then tests for equality only if the objects are different.1
float("NaN") in [float("NaN")]
is False because two different NaN
objects are involved in the comparison. The test for identity therefore returns False, and then the test for equality also returns False since NaN != NaN
.
np.nan in [np.nan, 1, 2]
however is True because the same NaN
object is involved in the comparison. The test for object identity returns True and so Python immediately recognises the item as being in the list.
The __contains__
method (invoked using in
) for many of Python’s other builtin Container types, such as tuples and sets, is implemented using the same check.
1 At least this is true in CPython. Object identity here means that the objects are found at the same memory address: the contains method for lists is performed using PyObject_RichCompareBool
which quickly compares object pointers before a potentially more complicated object comparison. Other Python implementations may differ.
One thing worth mentioning is that numpy arrays do behave as expected:
a = np.array((np.nan,))
a[0] in a
# False
Variations of the theme:
[np.nan]==[np.nan]
# True
[float('nan')]==[float('nan')]
# False
{np.nan: 0}[np.nan]
# 0
{float('nan'): 0}[float('nan')]
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# KeyError: nan
Everything else is covered in @AlexRiley’s excellent answer.
I used to believe that in
operator in Python checks the presence of element in some collection using equality checking ==
, so element in some_list
is roughly equivalent to any(x == element for x in some_list)
. For example:
True in [1, 2, 3]
# True because True == 1
or
1 in [1., 2., 3.]
# also True because 1 == 1.
However, it is well-known that NaN
is not equal to itself. So I expected that float("NaN") in [float("NaN")]
is False
. And it is False
indeed.
However, if we use numpy.nan
instead of float("NaN")
, the situation is quite different:
import numpy as np
np.nan in [np.nan, 1, 2]
# True
But np.nan == np.nan
still gives False
!
How is it possible? What’s the difference between np.nan
and float("NaN")
? How does in
deal with np.nan
?
To check if the item is in the list, Python tests for object identity first, and then tests for equality only if the objects are different.1
float("NaN") in [float("NaN")]
is False because two different NaN
objects are involved in the comparison. The test for identity therefore returns False, and then the test for equality also returns False since NaN != NaN
.
np.nan in [np.nan, 1, 2]
however is True because the same NaN
object is involved in the comparison. The test for object identity returns True and so Python immediately recognises the item as being in the list.
The __contains__
method (invoked using in
) for many of Python’s other builtin Container types, such as tuples and sets, is implemented using the same check.
1 At least this is true in CPython. Object identity here means that the objects are found at the same memory address: the contains method for lists is performed using PyObject_RichCompareBool
which quickly compares object pointers before a potentially more complicated object comparison. Other Python implementations may differ.
One thing worth mentioning is that numpy arrays do behave as expected:
a = np.array((np.nan,))
a[0] in a
# False
Variations of the theme:
[np.nan]==[np.nan]
# True
[float('nan')]==[float('nan')]
# False
{np.nan: 0}[np.nan]
# 0
{float('nan'): 0}[float('nan')]
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# KeyError: nan
Everything else is covered in @AlexRiley’s excellent answer.