How to attach a groupby aggregate to the original dataframe where the aggregate is placed in a new column at the bottom of each group
Question:
I’ve got a dataframe df = pd.DataFrame({'A':[1,1,2,2],'values':np.arange(10,30,5)})
How can I group by A
to get the sum of values
, where the sum is placed in a new column sum_values_A
, but only once at the bottom of each group. e.g.
A values sum_values_A
0 1 10 NaN
1 1 15 25
2 2 20 NaN
3 2 25 45
I tried
df['sum_values_A'] = df.groupby('A')['values'].transform('sum')
df['sum_values_A'] = df.groupby('A')['sum_values_A'].unique()
But couldn’t find a way to get the unique sums to be sorted at the bottom of each group
Answers:
You can use:
df.loc[~df['A'].duplicated(keep='last'),
'sum_values_A'
] = df.groupby('A')['values'].transform('sum')
print(df)
Or:
m = ~df['A'].duplicated(keep='last')
df.loc[m, 'sum_values_A'] = df.loc[m, 'A'].map(df.groupby('A')['values'].sum())
Output:
A values sum_values_A
0 1 10 NaN
1 1 15 25.0
2 2 20 NaN
3 2 25 45.0
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[1,1,2,2],'values':np.arange(10,30,5)})
df = df.assign(sum_values_A=df.groupby('A').cumsum()
[df['A'].duplicated(keep='first')])
>>> df
A values sum_values_A
0 1 10 NaN
1 1 15 25.0
2 2 20 NaN
3 2 25 45.0
I’ve got a dataframe df = pd.DataFrame({'A':[1,1,2,2],'values':np.arange(10,30,5)})
How can I group by A
to get the sum of values
, where the sum is placed in a new column sum_values_A
, but only once at the bottom of each group. e.g.
A values sum_values_A
0 1 10 NaN
1 1 15 25
2 2 20 NaN
3 2 25 45
I tried
df['sum_values_A'] = df.groupby('A')['values'].transform('sum')
df['sum_values_A'] = df.groupby('A')['sum_values_A'].unique()
But couldn’t find a way to get the unique sums to be sorted at the bottom of each group
You can use:
df.loc[~df['A'].duplicated(keep='last'),
'sum_values_A'
] = df.groupby('A')['values'].transform('sum')
print(df)
Or:
m = ~df['A'].duplicated(keep='last')
df.loc[m, 'sum_values_A'] = df.loc[m, 'A'].map(df.groupby('A')['values'].sum())
Output:
A values sum_values_A
0 1 10 NaN
1 1 15 25.0
2 2 20 NaN
3 2 25 45.0
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[1,1,2,2],'values':np.arange(10,30,5)})
df = df.assign(sum_values_A=df.groupby('A').cumsum()
[df['A'].duplicated(keep='first')])
>>> df
A values sum_values_A
0 1 10 NaN
1 1 15 25.0
2 2 20 NaN
3 2 25 45.0