Calculate mean of column for each row excluding the row for which mean is calculated

Question:

I need to calculate the mean of a certain column in DataFrame, so that means for each row is calculated excluding the value of the row for which it’s calculated.

I know I can iterate each row by index, dropping each row by index in every iteration, and then calculating mean. I wonder if there’s a more efficient way of doing it.

Asked By: DmytroSytro

||

Answers:

you can dataframe["ColumnName"].mean() for single column, or dataframe.describe() for all columns

Answered By: dejanmarich

So mean is sum/size so you can subtract sum of all values by column and divide by length of DataFrame without 1:

df = pd.DataFrame({'a':[1,2,3,4]})

#slow, working only with unique values
df['b'] = df['a'].apply(lambda x: df.loc[df.a != x, 'a'].mean())
#faster
df['b1'] = (df['a'].sum() - df['a']) / (len(df) - 1)
print (df)
   a         b        b1
0  1  3.000000  3.000000
1  2  2.666667  2.666667
2  3  2.333333  2.333333
3  4  2.000000  2.000000
Answered By: jezrael

The approach told is not working when we have duplicate data In column ‘a’ of df

Answered By: Abhinav Khandelwal