How to attach a groupby aggregate to the original dataframe where the aggregate is placed in a new column at the bottom of each group

Question:

I’ve got a dataframe df = pd.DataFrame({'A':[1,1,2,2],'values':np.arange(10,30,5)})

How can I group by A to get the sum of values, where the sum is placed in a new column sum_values_A, but only once at the bottom of each group. e.g.

    A   values  sum_values_A
0   1   10      NaN
1   1   15      25
2   2   20      NaN
3   2   25      45

I tried

df['sum_values_A'] = df.groupby('A')['values'].transform('sum')

df['sum_values_A'] = df.groupby('A')['sum_values_A'].unique()

But couldn’t find a way to get the unique sums to be sorted at the bottom of each group

Asked By: meg hidey

||

Answers:

You can use:

df.loc[~df['A'].duplicated(keep='last'),
       'sum_values_A'
      ] = df.groupby('A')['values'].transform('sum')

print(df)

Or:

m = ~df['A'].duplicated(keep='last')

df.loc[m, 'sum_values_A'] = df.loc[m, 'A'].map(df.groupby('A')['values'].sum())

Output:

   A  values  sum_values_A
0  1      10           NaN
1  1      15          25.0
2  2      20           NaN
3  2      25          45.0
Answered By: mozway
import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[1,1,2,2],'values':np.arange(10,30,5)})

df = df.assign(sum_values_A=df.groupby('A').cumsum()
               [df['A'].duplicated(keep='first')])
>>> df
   A  values  sum_values_A
0  1      10           NaN
1  1      15          25.0
2  2      20           NaN
3  2      25          45.0
Answered By: Laurent B.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.