How to do aggregation in Pandas with different aggregation functions within groups?

Question

I want to do group_by into aggregation, but for each group I want to use a function based on values from a special column which stores which function needed to be used. Easier to show on example:

id	group	val	func
0	0	0	"avg"
1	0	2	"avg"
2	0	2	"avg"
3	1	0	"med"
4	1	2	"med"

So in that example expected behaviour would be "avg" aggregation for group 0 and "median" for group 1. How can I make agg to choose function based on "func" column values? I know that I can calculate each agg function for each group and then use func as mask for choosing right values, but that isn’t that great since I’d do a lot of not needed calculations, there should be a better approach…

P.S. It’s guaranteed that func is the same within each group so I don’t have to worry about that.

I’ve written my own solution for my specific case and I’ll add that in question, but answer below is fine too.
So, my approach was:

Use dict to transform from table-provided format to proper pandas names as suggested in answer:

func_dict = {"avg": "mean", "med": "median", "min": "min","max": "max", "rnk": "first"}

I wrote a custom function to pass to apply later:

    def pick_price(subframe: pd.DataFrame) -> float:
        func_name = subframe["agg"].iloc[0]
        func_name = func_dict[func_name]
        # this picks from first line in subframe a name and get real name from dict
        # and next "if" block applies them among subframe
        if func_name != "first":
            ans = subframe["comp_price"].agg(func_name)
            return 1.0 * ans
        else:
            idx = subframe["rank"].idxmin()
            return 1.0 * subframe["comp_price"].loc[idx]

That function takes subframe with group with one same function to apply, and well, apply it.
3. Finally, use that function. First, group by groups where we need to apply different functions, and just apply with apply() method:

grouped = X.groupby("sku")

grouped.apply(pick_price)

Asked By: Игорь Агафонов

||

Source

Answer 1

I would use a dictionary of group: function:

f = {0: 'mean', 1: 'median'}
df['out'] = df.groupby('group')['val'].transform(lambda s: s.agg(f.get(s.name)))

Output:

   id  group  val       out
0   0      0    0  1.333333
1   1      0    2  1.333333
2   2      0    2  1.333333
3   3      1    0  1.000000
4   4      1    2  1.000000

variant using a column as source

NB. it’s a bit hacky, I prefer the dictionary. It extract the function name from the first rows of the group. The names must be valid, like mean/meadian, not avg/med.

df['out'] = (df.groupby('group')['val']
               .transform(lambda s: s.agg(df.loc[s.index[0], 'func']))
             )

Output:

   id  group  val    func       out
0   0      0    0    mean  1.333333
1   1      0    2    mean  1.333333
2   2      0    2    mean  1.333333
3   3      1    0  median  1.000000
4   4      1    2  median  1.000000

Answered By: mozway

How to do aggregation in Pandas with different aggregation functions within groups?

Question:

Answers:

variant using a column as source