Comparing numpy arrays containing NaN
Question:
For my unittest, I want to check if two arrays are identical. Reduced example:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
if np.all(a==b):
print 'arrays are equal'
This does not work because nan != nan
.
What is the best way to proceed?
Answers:
You could use numpy masked arrays, mask the NaN
values and then use numpy.ma.all
or numpy.ma.allclose
:
For example:
a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True
For versions of numpy prior to 1.19, this is probably the best approach in situations that don’t specifically involve unit tests:
>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True
However, modern versions provide the array_equal
function with a new keyword argument, equal_nan
, which fits the bill exactly.
This was first pointed out by flyingdutchman; see his answer below for details.
Alternatively you can use numpy.testing.assert_equal
or numpy.testing.assert_array_equal
with a try/except
:
In : import numpy as np
In : def nan_equal(a,b):
...: try:
...: np.testing.assert_equal(a,b)
...: except AssertionError:
...: return False
...: return True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([1, 2, np.NaN])
In : nan_equal(a,b)
Out: True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([3, 2, np.NaN])
In : nan_equal(a,b)
Out: False
Edit
Since you are using this for unittesting, bare assert
(instead of wrapping it to get True/False
) might be more natural.
When I used the above answer:
((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
It gave me some erros when evaluate list of strings.
This is more type generic:
def EQUAL(a,b):
return ((a == b) | ((a != a) & (b != b)))
The easiest way is use numpy.allclose()
method, which allow to specify the behaviour when having nan values. Then your example will look like the following:
a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])
if np.allclose(a, b, equal_nan=True):
print('arrays are equal')
Then arrays are equal
will be printed.
You can find here the related documentation
If you do this for things like unit tests, so you don’t care much about performance and “correct” behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:
a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)
Casting ndarray
s to list
s can sometimes be useful to get the behaviour you want in some test. (But don’t use this in production code, or with larger arrays!)
Just to complete @Luis Albert Centeno’s answer, you may rather use:
np.allclose(a, b, rtol=0, atol=0, equal_nan=True)
rtol
and atol
control the tolerance of the equality test. In short, allclose()
returns:
all(abs(a - b) <= atol + rtol * abs(b))
By default they are not set to 0, so the function could return True
if your numbers are close but not exactly equal.
PS: “I want to check if two arrays are identical ” >>
Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)
You’d test identity via keyword is
:
a is b
The numpy function array_equal fits the question’s requirements perfectly with the equal_nan
parameter added in 1.19.
The example would look as follows:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)
But be aware of the problem that this won’t work if an element is of dtype object
. Not sure if this is a bug or not.
As of v1.19, numpy’s array_equal
function supports an equal_nan
argument:
assert np.array_equal(a, b, equal_nan=True)
For me this worked fine:
a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where =
numpy.logical_not(numpy.logical_or(
numpy.isnan(a),
numpy.isnan(b)
))
).all()
PS. Ignores comparison when there’s a nan
For my unittest, I want to check if two arrays are identical. Reduced example:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
if np.all(a==b):
print 'arrays are equal'
This does not work because nan != nan
.
What is the best way to proceed?
You could use numpy masked arrays, mask the NaN
values and then use numpy.ma.all
or numpy.ma.allclose
:
For example:
a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True
For versions of numpy prior to 1.19, this is probably the best approach in situations that don’t specifically involve unit tests:
>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True
However, modern versions provide the array_equal
function with a new keyword argument, equal_nan
, which fits the bill exactly.
This was first pointed out by flyingdutchman; see his answer below for details.
Alternatively you can use numpy.testing.assert_equal
or numpy.testing.assert_array_equal
with a try/except
:
In : import numpy as np
In : def nan_equal(a,b):
...: try:
...: np.testing.assert_equal(a,b)
...: except AssertionError:
...: return False
...: return True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([1, 2, np.NaN])
In : nan_equal(a,b)
Out: True
In : a=np.array([1, 2, np.NaN])
In : b=np.array([3, 2, np.NaN])
In : nan_equal(a,b)
Out: False
Edit
Since you are using this for unittesting, bare assert
(instead of wrapping it to get True/False
) might be more natural.
When I used the above answer:
((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
It gave me some erros when evaluate list of strings.
This is more type generic:
def EQUAL(a,b):
return ((a == b) | ((a != a) & (b != b)))
The easiest way is use numpy.allclose()
method, which allow to specify the behaviour when having nan values. Then your example will look like the following:
a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])
if np.allclose(a, b, equal_nan=True):
print('arrays are equal')
Then arrays are equal
will be printed.
You can find here the related documentation
If you do this for things like unit tests, so you don’t care much about performance and “correct” behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:
a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)
Casting ndarray
s to list
s can sometimes be useful to get the behaviour you want in some test. (But don’t use this in production code, or with larger arrays!)
Just to complete @Luis Albert Centeno’s answer, you may rather use:
np.allclose(a, b, rtol=0, atol=0, equal_nan=True)
rtol
and atol
control the tolerance of the equality test. In short, allclose()
returns:
all(abs(a - b) <= atol + rtol * abs(b))
By default they are not set to 0, so the function could return True
if your numbers are close but not exactly equal.
PS: “I want to check if two arrays are identical ” >>
Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)
You’d test identity via keyword is
:
a is b
The numpy function array_equal fits the question’s requirements perfectly with the equal_nan
parameter added in 1.19.
The example would look as follows:
a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)
But be aware of the problem that this won’t work if an element is of dtype object
. Not sure if this is a bug or not.
As of v1.19, numpy’s array_equal
function supports an equal_nan
argument:
assert np.array_equal(a, b, equal_nan=True)
For me this worked fine:
a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where =
numpy.logical_not(numpy.logical_or(
numpy.isnan(a),
numpy.isnan(b)
))
).all()
PS. Ignores comparison when there’s a nan