Replacing the first element of each group by its aggregation function

Question:

Suppose the following dataframe:

df = pd.DataFrame(
    {'X': ['a', 'a', 'b', 'a', 'b'],
     'Y': [2, 4, 8, 10, 5]})

which looks as:

    X   Y
0   a   2
1   a   4
2   b   8
3   a   10
4   b   5

How to replace the first element of each group by X with the respective mean?

The expected output:

    X   Y
0   a   5.33
1   a   4.00
2   b   6.50
3   a   10.00
4   b   5.00

Sorry if this is a too basic question, but I am a newbie to Python (beginning its learning).

Asked By: PaulS

||

Answers:

You can do:

g = df.groupby('X', as_index=False)
df.iloc[g.head(1).index] = g.mean()

Basically get the indexes of first rows of each group and replace them with mean values.

print(df):

   X          Y
0  a   5.333333
1  a   4.000000
2  b   6.500000
3  a  10.000000
4  b   5.000000
Answered By: SomeDude

Use GroupBy.transform for averages and set only first value per group in numpy.where with mask by Series.duplicated:

df['Y'] = np.where(df.X.duplicated(),df.Y,df.groupby("X")['Y'].transform('mean'))
print (df)
   X          Y
0  a   5.333333
1  a   4.000000
2  b   6.500000
3  a  10.000000
4  b   5.000000
    

Another solution with DataFrame.loc:

df.loc[~df.X.duplicated(), 'Y'] = df.groupby("X")['Y'].transform('mean')
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.