Median absolute deviation from numpy ndarray
Question:
I work with a 4D numpy array where I compute statistics mean, meadin, std
along the 3rd dimension of the array like so:
import numpy as np
input_shape = (1, 10, 4)
n_sample =20
X = np.random.uniform(0,1, (n_sample,)+input_shape)
X.shape
(20, 1, 10, 4)
Then I compute the mean, med,
and std-dev
this way:
sta_fuc = (np.mean, np.median, np.std)
stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
So that:
stat.shape
(20, 1, 3, 4)
represent the values of the mean, median
and std
along that dimension.
But then I would like to add the value of the column’s mean absolute deviation mad
so that the statistics is (mean, median, std, mad
), but it appears numpy
doesn’t provide a function for that. How do I add mad
to my statistics?
EDIT
As far the first answer, using the defined function, i.e.:
def mad(arr, axis=None, keepdims=True):
median = np.median(arr, axis=axis, keepdims=True)
mad = np.median(np.abs(arr-median, axis=axis, keepdims=keepdims),
axis=axis, keepdims=keepdims)
return mad
Then adding mad
to the statistics, which generate an error, like so:
sta_fuc = (np.mean, np.median, np.std, mad)
stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-dab51665f952> in <module>()
1 sta_fuc = (np.mean, np.median, np.std, mad)
----> 2 stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
1 frames
<ipython-input-21-84d735c8c516> in mad(arr, axis, keepdims)
1 def mad(arr, axis=None, keepdims=True):
2 median = np.median(arr, axis=axis, keepdims=True)
----> 3 mad = np.median(np.abs(arr-median, axis=axis, keepdims=keepdims),
4 axis=axis, keepdims=keepdims)
5 return mad
TypeError: 'axis' is an invalid keyword to ufunc 'absolute'
EDIT-2
Using the scipy
function suggested by @Jussi also generates error as below:
from scipy.stats import median_absolute_deviation as mad
sta_fuc = (np.mean, np.median, np.std, mad)
stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
TypeError: median_absolute_deviation() got an unexpected keyword argument 'keepdims'
Answers:
Usually, I’ve seen MAD referring to median absolute deviation. If that’s what you want, it’s available in the SciPy library as scipy.stats.median_absolute_deviation()
.
It’s also pretty easy to write a suitable function yourself.
Edit: here’s a MAD function that takes a keepdims
argument:
def mad(data, axis=None, scale=1.4826, keepdims=False):
"""Median absolute deviation (MAD).
Defined as the median absolute deviation from the median of the data. A
robust alternative to stddev. Results should be identical to
scipy.stats.median_absolute_deviation(), which does not take a keepdims
argument.
Parameters
----------
data : array_like
The data.
scale : float, optional
Scaling of the result. By default, it is scaled to give a consistent
estimate of the standard deviation of values from a normal
distribution.
axis : numpy axis spec, optional
Axis or axes along which to compute MAD.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the
result as dimensions with size one.
Returns
-------
ndarray
The MAD.
"""
# keep dims here so that broadcasting works
med = np.median(data, axis=axis, keepdims=True)
abs_devs = np.abs(data - med)
return scale * np.median(abs_devs, axis=axis, keepdims=keepdims)
I’m not aware of a built-in solution using numpy. But you can implement it based on numpy functions fairly easily, using mad = median(abs(a - median(a)))
.
def mad(arr, axis=None, keepdims=True):
median = np.median(arr, axis=axis, keepdims=True)
mad = np.median(np.abs(arr-median),axis=axis, keepdims=keepdims)
return mad
I work with a 4D numpy array where I compute statistics mean, meadin, std
along the 3rd dimension of the array like so:
import numpy as np
input_shape = (1, 10, 4)
n_sample =20
X = np.random.uniform(0,1, (n_sample,)+input_shape)
X.shape
(20, 1, 10, 4)
Then I compute the mean, med,
and std-dev
this way:
sta_fuc = (np.mean, np.median, np.std)
stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
So that:
stat.shape
(20, 1, 3, 4)
represent the values of the mean, median
and std
along that dimension.
But then I would like to add the value of the column’s mean absolute deviation mad
so that the statistics is (mean, median, std, mad
), but it appears numpy
doesn’t provide a function for that. How do I add mad
to my statistics?
EDIT
As far the first answer, using the defined function, i.e.:
def mad(arr, axis=None, keepdims=True):
median = np.median(arr, axis=axis, keepdims=True)
mad = np.median(np.abs(arr-median, axis=axis, keepdims=keepdims),
axis=axis, keepdims=keepdims)
return mad
Then adding mad
to the statistics, which generate an error, like so:
sta_fuc = (np.mean, np.median, np.std, mad)
stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-dab51665f952> in <module>()
1 sta_fuc = (np.mean, np.median, np.std, mad)
----> 2 stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
1 frames
<ipython-input-21-84d735c8c516> in mad(arr, axis, keepdims)
1 def mad(arr, axis=None, keepdims=True):
2 median = np.median(arr, axis=axis, keepdims=True)
----> 3 mad = np.median(np.abs(arr-median, axis=axis, keepdims=keepdims),
4 axis=axis, keepdims=keepdims)
5 return mad
TypeError: 'axis' is an invalid keyword to ufunc 'absolute'
EDIT-2
Using the scipy
function suggested by @Jussi also generates error as below:
from scipy.stats import median_absolute_deviation as mad
sta_fuc = (np.mean, np.median, np.std, mad)
stat = np.concatenate([func(X, axis=2, keepdims=True) for func in sta_fuc], axis=2)
TypeError: median_absolute_deviation() got an unexpected keyword argument 'keepdims'
Usually, I’ve seen MAD referring to median absolute deviation. If that’s what you want, it’s available in the SciPy library as scipy.stats.median_absolute_deviation()
.
It’s also pretty easy to write a suitable function yourself.
Edit: here’s a MAD function that takes a keepdims
argument:
def mad(data, axis=None, scale=1.4826, keepdims=False):
"""Median absolute deviation (MAD).
Defined as the median absolute deviation from the median of the data. A
robust alternative to stddev. Results should be identical to
scipy.stats.median_absolute_deviation(), which does not take a keepdims
argument.
Parameters
----------
data : array_like
The data.
scale : float, optional
Scaling of the result. By default, it is scaled to give a consistent
estimate of the standard deviation of values from a normal
distribution.
axis : numpy axis spec, optional
Axis or axes along which to compute MAD.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left in the
result as dimensions with size one.
Returns
-------
ndarray
The MAD.
"""
# keep dims here so that broadcasting works
med = np.median(data, axis=axis, keepdims=True)
abs_devs = np.abs(data - med)
return scale * np.median(abs_devs, axis=axis, keepdims=keepdims)
I’m not aware of a built-in solution using numpy. But you can implement it based on numpy functions fairly easily, using mad = median(abs(a - median(a)))
.
def mad(arr, axis=None, keepdims=True):
median = np.median(arr, axis=axis, keepdims=True)
mad = np.median(np.abs(arr-median),axis=axis, keepdims=keepdims)
return mad