Mean of non-diagonal elements of each row numpy

Question:

I essentially have a confusion matrix of size n x n with all my diagonal elements being 1.

For every row, I wish to calculate its mean, excluding the 1, i.e. excluding the diagonal value. Is there a simple way to do it in numpy?

This is my current solution:

mask = np.zeros(cs.shape, dtype=bool)
np.fill_diagonal(mask, 1)
print(np.ma.masked_array(cs, mask).mean(axis=1))

where cs is my n x n matrix

The code seems convoluted, and I certainly feel that there’s a much more elegant solution.

Asked By: OlorinIstari

||

Answers:

A concise one using summation

(cs.sum(1)-1)/(cs.shape[1]-1)

For a general case of ignoring diagonal elements, use np.diag in place of 1 offset –

(cs.sum(1)-np.diag(cs))/(cs.shape[1]-1)

Another with mean

n = cs.shape[1]
(cs.mean(1)-1./n)*(n/(n-1))
Answered By: Divakar

Another straightforward idea would be to use the built-in numpy.average() API where we supply weights for each of the elements in the confusion matrix to contribute to the average. This gives us the flexibility to exclude any of the elements in the matrix by setting their weights to zero. Below is a complete example:

# input array to work with
In [51]: arr 
Out[51]: 
array([[ 1,  2,  3,  4],
       [ 5,  1,  7,  8],
       [ 9, 10,  1, 12],
       [13, 14, 15,  1]])

# weightage for each of the elements in the input matrix
In [52]: weights = (arr != 1).astype(int)  # boolean mask would also work

# so, we give 0 weight to all the diagonal elements
In [53]: weights 
Out[53]: 
array([[0, 1, 1, 1],
       [1, 0, 1, 1],
       [1, 1, 0, 1],
       [1, 1, 1, 0]])

# finally, use the weights when computing average over axis 1
In [54]: np.average(arr, axis=1, weights=weights) 
Out[54]: array([ 3.        ,  6.66666667, 10.33333333, 14.        ])
Answered By: kmario23

You can also replace the diagonal elements with np.nan and use np.nanmean() as follows:

# Make sure you use dtype float
cs = cs.astype('float64')

# Fill diagonal elements with np.nan
np.fill_diagonal(cs, np.nan)

np.nanmean(cs, axis=1)

Here you can also easily get the standard deviation with:

np.nanstd(cs, axis=1)
Answered By: coreehi