keep dataframe rows meeting a condition into each group of the same dataframe grouped by

Question:

I have the following dataframe.

    c1  c2  v1  v2
0   a   a   1   2
1   a   a   2   3
2   b   a   3   1
3   b   a   4   5
5   c   d   5   0

I wish to have the following output.

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0

The rule. First group dataframe by c1, c2. Then into each group, keep the row with the maximun value in column v2.
Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.

What is the better way to obtain this result? Thanks.

Going around, I have found also this solution based on apply method

Asked By: Marcus

||

Answers:

You could use groupby-transform to generate a boolean selection mask:

grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)

yields

  c1 c2  v1  v2
0  a  a   2   3
1  b  a   4   5
2  c  d   5   0
Answered By: unutbu

If you want to make sure that you get one single row per group, you can sort the values by "v2" before grouping and then just take the last row (the one with the highest v2-value).

df = pd.DataFrame({"c1": ["a", "a", "b", "b", "c"], "c2": ["a", "a", "a", "a", "d"], "v1": [1, 2, 3, 4, 5], "v2": [2, 3, 1, 5, 0]})

df.sort_values("v2").groupby(["c1", "c2"]).last().reset_index()

result:

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0
Answered By: Nacho
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.