calculate mean of columns which only include column value larger than zero

Question:

I’d like to calculate the mean of all columns but only consider the columns’ value larger than zero.

For example

df_dict={'A':[0,2,10,0],'B':[10,0,30,40],'C':[0,5,10,10]}
df=pd.DataFrame(df_dict)
df
A   B   C
0   0   10  0
1   2   0   5
2   10  30  10
3   0   40  10

normally, if we just use df.mean(axis=1), it would produce

df.mean(axis=1)
0     3.333333
1     2.333333
2    16.666667
3    16.666667
dtype: float64

what I wanted is: don’t consider any value less than or equal to zero when calculating mean, the result I’d to have is
10/1,(2+5)/2,(10+30+10)/3,(40+10)/2 for each row

How to do it? Thanks

Asked By: roudan

||

Answers:

First mask the unwanted values. NaNs will be ignored.

With where:

df.where(df.gt(0)).mean(axis=1)

Or with mask:

df.mask(df.le(0)).mean(axis=1)

Output:

0    10.000000
1     3.500000
2    16.666667
3    25.000000
dtype: float64
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.