Pandas group by x, sort by y, select z, aggregating in case of multiple maximum values

Question:

Suppose I have a dataframe df:

df = pd.DataFrame({'group_id' : [1,1,1,2,2,3,3,3,3],
                   'amount'  : [2,4,5,1,2,3,5,5,5],
                   'x':[2,5,8,3,6,9,3,1,0]})

   group_id  amount  x
0         1       2  2
1         1       4  5
2         1       5  8
3         2       1  3
4         2       2  6
5         3       3  9
6         3       5  3
7         3       5  1
8         3       5  0

I want to group it by group_id, then pick x, corresponding to the largest amount. The part which I cannot figure out is how to deal with cases when there are multiple rows with maximum amount. For example, 3 last rows in the df above.
In such cases, I would like to aggregate values of x using mean, median or mode of x. I am trying to get the solution, in which I can implement every one of these 3 aggregation methods.

I saw many questions here, which solve the problem without dealing with multiple maximum values. For example, I could do something like this:

df.sort_values('amount', ascending=False).groupby('group_id').first().x

But I do not know how to implement different aggregation approaches.

EDIT: The second part of this question is here.

Asked By: Moysey Abramowitz

||

Answers:

If I understand you question right, you can use custom function with GroupBy.apply:

out = df.groupby("group_id").apply(
    lambda x: pd.Series(
        {
            "mean": (d := x.loc[x["amount"] == x["amount"].max(), "x"]).mean(),
            "median": d.median(),
            "mode": d.mode()[0],
        }
    )
)
print(out)

Prints:

              mean  median  mode
group_id                        
1         8.000000     8.0   8.0
2         6.000000     6.0   6.0
3         1.333333     1.0   0.0

Or .describe():

out = df.groupby("group_id").apply(
    lambda x: x.loc[x["amount"] == x["amount"].max(), "x"].describe()
)
print(out)

Prints:

x         count      mean       std  min  25%  50%  75%  max
group_id                                                    
1           1.0  8.000000       NaN  8.0  8.0  8.0  8.0  8.0
2           1.0  6.000000       NaN  6.0  6.0  6.0  6.0  6.0
3           3.0  1.333333  1.527525  0.0  0.5  1.0  2.0  3.0
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.