Get mean value avoiding nan using numpy in python
Question:
How to calculate mean value of an array (A) avoiding nan?
import numpy as np
A = [5 nan nan nan nan 10]
M = np.mean(A[A!=nan]) does not work
Any idea?
Answers:
Use numpy.isnan
:
>>> import numpy as np
>>> A = np.array([5, np.nan, np.nan, np.nan, np.nan, 10])
>>> np.isnan(A)
array([False, True, True, True, True, False], dtype=bool)
>>> ~np.isnan(A)
array([ True, False, False, False, False, True], dtype=bool)
>>> A[~np.isnan(A)]
array([ 5., 10.])
>>> A[~np.isnan(A)].mean()
7.5
because you cannot compare nan
with nan
:
>>> np.nan == np.nan
False
>>> np.nan != np.nan
True
>>> np.isnan(np.nan)
True
An other possibility is the following:
import numpy
from scipy.stats import nanmean # nanmedian exists too, if you need it
A = numpy.array([5, numpy.nan, numpy.nan, numpy.nan, numpy.nan, 10])
print nanmean(A) # gives 7.5 as expected
i guess this looks more elegant (and readable) than the other solution already given
edit: apparently (@Jaime) reports that this functionality already exists directly in the latest numpy
(1.8) as well, so no need to import scipy.stats
anymore if you have that version of numpy
:
import numpy
A = numpy.array([5, numpy.nan, numpy.nan, numpy.nan, numpy.nan, 10])
print numpy.nanmean(A)
the first solution works also for people who dont have the latest version of numpy
(like me)
How to calculate mean value of an array (A) avoiding nan?
import numpy as np
A = [5 nan nan nan nan 10]
M = np.mean(A[A!=nan]) does not work
Any idea?
Use numpy.isnan
:
>>> import numpy as np
>>> A = np.array([5, np.nan, np.nan, np.nan, np.nan, 10])
>>> np.isnan(A)
array([False, True, True, True, True, False], dtype=bool)
>>> ~np.isnan(A)
array([ True, False, False, False, False, True], dtype=bool)
>>> A[~np.isnan(A)]
array([ 5., 10.])
>>> A[~np.isnan(A)].mean()
7.5
because you cannot compare nan
with nan
:
>>> np.nan == np.nan
False
>>> np.nan != np.nan
True
>>> np.isnan(np.nan)
True
An other possibility is the following:
import numpy
from scipy.stats import nanmean # nanmedian exists too, if you need it
A = numpy.array([5, numpy.nan, numpy.nan, numpy.nan, numpy.nan, 10])
print nanmean(A) # gives 7.5 as expected
i guess this looks more elegant (and readable) than the other solution already given
edit: apparently (@Jaime) reports that this functionality already exists directly in the latest numpy
(1.8) as well, so no need to import scipy.stats
anymore if you have that version of numpy
:
import numpy
A = numpy.array([5, numpy.nan, numpy.nan, numpy.nan, numpy.nan, 10])
print numpy.nanmean(A)
the first solution works also for people who dont have the latest version of numpy
(like me)