Calculate mean of column for each row excluding the row for which mean is calculated
Question:
I need to calculate the mean of a certain column in DataFrame, so that means for each row is calculated excluding the value of the row for which it’s calculated.
I know I can iterate each row by index, dropping each row by index in every iteration, and then calculating mean. I wonder if there’s a more efficient way of doing it.
Answers:
you can dataframe["ColumnName"].mean()
for single column, or dataframe.describe()
for all columns
So mean
is sum/size
so you can subtract sum
of all values by column and divide by length of DataFrame
without 1
:
df = pd.DataFrame({'a':[1,2,3,4]})
#slow, working only with unique values
df['b'] = df['a'].apply(lambda x: df.loc[df.a != x, 'a'].mean())
#faster
df['b1'] = (df['a'].sum() - df['a']) / (len(df) - 1)
print (df)
a b b1
0 1 3.000000 3.000000
1 2 2.666667 2.666667
2 3 2.333333 2.333333
3 4 2.000000 2.000000
The approach told is not working when we have duplicate data In column ‘a’ of df
I need to calculate the mean of a certain column in DataFrame, so that means for each row is calculated excluding the value of the row for which it’s calculated.
I know I can iterate each row by index, dropping each row by index in every iteration, and then calculating mean. I wonder if there’s a more efficient way of doing it.
you can dataframe["ColumnName"].mean()
for single column, or dataframe.describe()
for all columns
So mean
is sum/size
so you can subtract sum
of all values by column and divide by length of DataFrame
without 1
:
df = pd.DataFrame({'a':[1,2,3,4]})
#slow, working only with unique values
df['b'] = df['a'].apply(lambda x: df.loc[df.a != x, 'a'].mean())
#faster
df['b1'] = (df['a'].sum() - df['a']) / (len(df) - 1)
print (df)
a b b1
0 1 3.000000 3.000000
1 2 2.666667 2.666667
2 3 2.333333 2.333333
3 4 2.000000 2.000000
The approach told is not working when we have duplicate data In column ‘a’ of df