Proportion of total in pd.NamedAgg
Question:
I have a df
like this:
Foo
Bar
A
54
A
95
B
49
A
25
B
25
B
32
B
55
A
98
A
83
B
90
I am computing some statistics using [aggfunc
][1]:
temp = pd.DataFrame.from_dict({'Foo': {0: 'A',
1: 'A',
2: 'B',
3: 'A',
4: 'B',
5: 'B',
6: 'B',
7: 'A',
8: 'A',
9: 'B'},
'Bar': {0: 54, 1: 95, 2: 49, 3: 25, 4: 25, 5: 32, 6: 55, 7: 98, 8: 83, 9: 90}})
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std')
)
Which results (correctly) in:
Foo
max_val
min_val
std_val
A
98
25
31.0403
B
90
25
25.37124
However, I want to add another column which is proportion of total, so for example:
Foo
max_val
min_val
std_val
pct_total
A
98
25
31.0403
0.59
B
90
25
25.37124
0.41
How can I accomplish this using aggfunc
?
[1]: https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html
Answers:
You could try as follows:
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std'),
pct_total = pd.NamedAgg(column='Bar', aggfunc=lambda x: x.sum()/temp.Bar.sum())
)
print(res_df)
max_val min_val std_val pct_total
Foo
A 98 25 31.040296 0.585809
B 90 25 25.371244 0.414191
Or use round(x.sum()/temp.Bar.sum(),2)
if you want them rounded: 0.59, 0.41
.
I have a df
like this:
Foo | Bar |
---|---|
A | 54 |
A | 95 |
B | 49 |
A | 25 |
B | 25 |
B | 32 |
B | 55 |
A | 98 |
A | 83 |
B | 90 |
I am computing some statistics using [aggfunc
][1]:
temp = pd.DataFrame.from_dict({'Foo': {0: 'A',
1: 'A',
2: 'B',
3: 'A',
4: 'B',
5: 'B',
6: 'B',
7: 'A',
8: 'A',
9: 'B'},
'Bar': {0: 54, 1: 95, 2: 49, 3: 25, 4: 25, 5: 32, 6: 55, 7: 98, 8: 83, 9: 90}})
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std')
)
Which results (correctly) in:
Foo | max_val | min_val | std_val |
---|---|---|---|
A | 98 | 25 | 31.0403 |
B | 90 | 25 | 25.37124 |
However, I want to add another column which is proportion of total, so for example:
Foo | max_val | min_val | std_val | pct_total |
---|---|---|---|---|
A | 98 | 25 | 31.0403 | 0.59 |
B | 90 | 25 | 25.37124 | 0.41 |
How can I accomplish this using aggfunc
?
[1]: https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html
You could try as follows:
res_df = temp.groupby('Foo').agg(
max_val = pd.NamedAgg(column='Bar', aggfunc='max'),
min_val = pd.NamedAgg(column='Bar', aggfunc='min'),
std_val = pd.NamedAgg(column='Bar', aggfunc='std'),
pct_total = pd.NamedAgg(column='Bar', aggfunc=lambda x: x.sum()/temp.Bar.sum())
)
print(res_df)
max_val min_val std_val pct_total
Foo
A 98 25 31.040296 0.585809
B 90 25 25.371244 0.414191
Or use round(x.sum()/temp.Bar.sum(),2)
if you want them rounded: 0.59, 0.41
.