Why will python function max() return different outputs if float('NaN') value is permuted in a dictionary but key-max_value remains the same?
Question:
Let’s pretend I have the following simple dictionary:
dictionary = {'a':3, 'b':4, 'c':float('NaN')}
If I use function max() to return the key with maximum value…
key_maxvalue = max(dictionary, key=dictionary.get)
print(key_maxvalue)
…python outputs this:
b
However, when I permute the values of keys ‘a’ and ‘c’…
dictionary = {'a':float('NaN'), 'b':4, 'c':3}
key_maxvalue = max(dictionary, key=dictionary.get)
print(key_maxvalue)
…I get this unexpected result:
a
I expected python would output ‘b’, as that key still has the maximum value in the dictionary. Why has a change in the values order altered the function max() output? Furthermore, how could I prevent this (unexpected) event from happening?
Answers:
The answer is, "don’t use NaN". The point of an NaN is that it is not a number, and cannot be relied on to act like a number in any rational way. What you’re seeing is that comparisons with NaN are not commutative.
Notice this:
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = float('NaN')
>>> 1 < x
False
>>> x < 1
False
>>>
Every comparison with a NaN is false. That makes sorting them indeterminate.
If you wrote your own function, it might look like this:
def max(nums):
largest = nums[0]
for item in nums:
if item > largest:
largest = item
return largest
The problem is this comparison item > largest
. Look what happens when you compare a number with np.nan
.
Input: np.nan > 4
Output: False
Input: 4 > np.nan
Output: False
Any comparison with a NaN
will be False. If max
functions like our written function, then it happens what happens in both of your cases. It’s not larger than 4, so b
is still the max. However, when it defaults to a
in the second case, no other number is larger than NaN
, so a
remains the max.
Let’s pretend I have the following simple dictionary:
dictionary = {'a':3, 'b':4, 'c':float('NaN')}
If I use function max() to return the key with maximum value…
key_maxvalue = max(dictionary, key=dictionary.get)
print(key_maxvalue)
…python outputs this:
b
However, when I permute the values of keys ‘a’ and ‘c’…
dictionary = {'a':float('NaN'), 'b':4, 'c':3}
key_maxvalue = max(dictionary, key=dictionary.get)
print(key_maxvalue)
…I get this unexpected result:
a
I expected python would output ‘b’, as that key still has the maximum value in the dictionary. Why has a change in the values order altered the function max() output? Furthermore, how could I prevent this (unexpected) event from happening?
The answer is, "don’t use NaN". The point of an NaN is that it is not a number, and cannot be relied on to act like a number in any rational way. What you’re seeing is that comparisons with NaN are not commutative.
Notice this:
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = float('NaN')
>>> 1 < x
False
>>> x < 1
False
>>>
Every comparison with a NaN is false. That makes sorting them indeterminate.
If you wrote your own function, it might look like this:
def max(nums):
largest = nums[0]
for item in nums:
if item > largest:
largest = item
return largest
The problem is this comparison item > largest
. Look what happens when you compare a number with np.nan
.
Input: np.nan > 4
Output: False
Input: 4 > np.nan
Output: False
Any comparison with a NaN
will be False. If max
functions like our written function, then it happens what happens in both of your cases. It’s not larger than 4, so b
is still the max. However, when it defaults to a
in the second case, no other number is larger than NaN
, so a
remains the max.