How to compare two numpy arrays with some NaN values?

Question:

I need to compare some numpy arrays which should have the same elements in the same order, excepting for some NaN values in the second one.

I need a function more or less like this:

def func( array1, array2 ):
    if ???:
        return True
    else:
        return False

Example:

x = np.array( [ 1, 2, 3, 4, 5 ] )
y = np.array( [ 11, 2, 3, 4, 5 ] )
z = np.array( [ 1, 2, np.nan, 4, 5] )

func( x, z ) # returns True
func( y, z ) # returns False

The arrays have always the same length and the NaN values are always in the third one (x and y have always numbers only). I can imagine there is a function or something already, but I just don’t find it.

Any ideas?

Asked By: Luis

||

Answers:

What about:

from math import isnan

def fun(array1,array2):
    return all(isnan(x) or isnan(y) or x == y for x,y in zip(array1,array2))

This function works in both directions (if there are NaNs in the first list, these are also ignored). If you do not want that (which is a bit odd since equality usually works bidirectional). You can define:

from math import isnan

def fun(array1,array2):
    return all(isnan(y) or x == y for x,y in zip(array1,array2))

The code works as follows: we use zip to emit tuples of elements of both arrays. Next we check if either the element of the first list is NaN, or the second, or they are equal.

Given you want to write a really elegant function, you better also perform a length check:

from math import isnan

def fun(array1,array2):
    return len(array1) == len(array2) and all(isnan(y) or x == y for x,y in zip(array1,array2))
Answered By: Willem Van Onsem

You can use masked arrays, which have the behaviour you’re asking for when combined with np.all:

zm = np.ma.masked_where(np.isnan(z), z)

np.all(x == zm) # returns True
np.all(y == zm) # returns False

Or you could just write out your logic explicitly, noting that numpy has to use | instead of or, and the difference in operator precedence that results:

def func(a, b):
    return np.all((a == b) | np.isnan(a) | np.isnan(b))
Answered By: Eric

You could use isclose to check for equality (or closeness to within a given tolerance — this is particularly useful when comparing floats) and use isnan to check for NaNs in the second array.
Combine the two with bitwise-or (|), and use all to demand every pair is either close or contains a NaN to obtain the desired result:

In [62]: np.isclose(x,z)
Out[62]: array([ True,  True, False,  True,  True], dtype=bool)

In [63]: np.isnan(z)
Out[63]: array([False, False,  True, False, False], dtype=bool)

So you could use:

def func(a, b):
    return (np.isclose(a, b) | np.isnan(b)).all()


In [67]: func(x, z)
Out[67]: True

In [68]: func(y, z)
Out[68]: False
Answered By: unutbu

numpy.islcose() now provides an argument equal_nan for this case!

>>> import numpy as np
>>> np.isclose([1.0, np.nan], [1.0, np.nan])
array([ True, False])
>>> np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)
array([ True,  True])

docs https://numpy.org/doc/stable/reference/generated/numpy.isclose.html

Answered By: ti7