Mean of non-diagonal elements of each row numpy
Question:
I essentially have a confusion matrix of size n x n
with all my diagonal elements being 1
.
For every row, I wish to calculate its mean, excluding the 1
, i.e. excluding the diagonal value. Is there a simple way to do it in numpy
?
This is my current solution:
mask = np.zeros(cs.shape, dtype=bool)
np.fill_diagonal(mask, 1)
print(np.ma.masked_array(cs, mask).mean(axis=1))
where cs
is my n x n
matrix
The code seems convoluted, and I certainly feel that there’s a much more elegant solution.
Answers:
A concise one using summation
–
(cs.sum(1)-1)/(cs.shape[1]-1)
For a general case of ignoring diagonal elements, use np.diag
in place of 1
offset –
(cs.sum(1)-np.diag(cs))/(cs.shape[1]-1)
Another with mean
–
n = cs.shape[1]
(cs.mean(1)-1./n)*(n/(n-1))
Another straightforward idea would be to use the built-in numpy.average()
API where we supply weights
for each of the elements in the confusion matrix to contribute to the average. This gives us the flexibility to exclude any of the elements in the matrix by setting their weights to zero. Below is a complete example:
# input array to work with
In [51]: arr
Out[51]:
array([[ 1, 2, 3, 4],
[ 5, 1, 7, 8],
[ 9, 10, 1, 12],
[13, 14, 15, 1]])
# weightage for each of the elements in the input matrix
In [52]: weights = (arr != 1).astype(int) # boolean mask would also work
# so, we give 0 weight to all the diagonal elements
In [53]: weights
Out[53]:
array([[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]])
# finally, use the weights when computing average over axis 1
In [54]: np.average(arr, axis=1, weights=weights)
Out[54]: array([ 3. , 6.66666667, 10.33333333, 14. ])
You can also replace the diagonal elements with np.nan and use np.nanmean() as follows:
# Make sure you use dtype float
cs = cs.astype('float64')
# Fill diagonal elements with np.nan
np.fill_diagonal(cs, np.nan)
np.nanmean(cs, axis=1)
Here you can also easily get the standard deviation with:
np.nanstd(cs, axis=1)
I essentially have a confusion matrix of size n x n
with all my diagonal elements being 1
.
For every row, I wish to calculate its mean, excluding the 1
, i.e. excluding the diagonal value. Is there a simple way to do it in numpy
?
This is my current solution:
mask = np.zeros(cs.shape, dtype=bool)
np.fill_diagonal(mask, 1)
print(np.ma.masked_array(cs, mask).mean(axis=1))
where cs
is my n x n
matrix
The code seems convoluted, and I certainly feel that there’s a much more elegant solution.
A concise one using summation
–
(cs.sum(1)-1)/(cs.shape[1]-1)
For a general case of ignoring diagonal elements, use np.diag
in place of 1
offset –
(cs.sum(1)-np.diag(cs))/(cs.shape[1]-1)
Another with mean
–
n = cs.shape[1]
(cs.mean(1)-1./n)*(n/(n-1))
Another straightforward idea would be to use the built-in numpy.average()
API where we supply weights
for each of the elements in the confusion matrix to contribute to the average. This gives us the flexibility to exclude any of the elements in the matrix by setting their weights to zero. Below is a complete example:
# input array to work with
In [51]: arr
Out[51]:
array([[ 1, 2, 3, 4],
[ 5, 1, 7, 8],
[ 9, 10, 1, 12],
[13, 14, 15, 1]])
# weightage for each of the elements in the input matrix
In [52]: weights = (arr != 1).astype(int) # boolean mask would also work
# so, we give 0 weight to all the diagonal elements
In [53]: weights
Out[53]:
array([[0, 1, 1, 1],
[1, 0, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 0]])
# finally, use the weights when computing average over axis 1
In [54]: np.average(arr, axis=1, weights=weights)
Out[54]: array([ 3. , 6.66666667, 10.33333333, 14. ])
You can also replace the diagonal elements with np.nan and use np.nanmean() as follows:
# Make sure you use dtype float
cs = cs.astype('float64')
# Fill diagonal elements with np.nan
np.fill_diagonal(cs, np.nan)
np.nanmean(cs, axis=1)
Here you can also easily get the standard deviation with:
np.nanstd(cs, axis=1)