keep dataframe rows meeting a condition into each group of the same dataframe grouped by
Question:
I have the following dataframe.
c1 c2 v1 v2
0 a a 1 2
1 a a 2 3
2 b a 3 1
3 b a 4 5
5 c d 5 0
I wish to have the following output.
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
The rule. First group dataframe by c1, c2. Then into each group, keep the row with the maximun value in column v2.
Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.
What is the better way to obtain this result? Thanks.
Going around, I have found also this solution based on apply method
Answers:
You could use groupby-transform
to generate a boolean selection mask:
grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
yields
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
If you want to make sure that you get one single row per group, you can sort the values by "v2" before grouping and then just take the last row (the one with the highest v2-value).
df = pd.DataFrame({"c1": ["a", "a", "b", "b", "c"], "c2": ["a", "a", "a", "a", "d"], "v1": [1, 2, 3, 4, 5], "v2": [2, 3, 1, 5, 0]})
df.sort_values("v2").groupby(["c1", "c2"]).last().reset_index()
result:
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
I have the following dataframe.
c1 c2 v1 v2
0 a a 1 2
1 a a 2 3
2 b a 3 1
3 b a 4 5
5 c d 5 0
I wish to have the following output.
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
The rule. First group dataframe by c1, c2. Then into each group, keep the row with the maximun value in column v2.
Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.
What is the better way to obtain this result? Thanks.
Going around, I have found also this solution based on apply method
You could use groupby-transform
to generate a boolean selection mask:
grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)
yields
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0
If you want to make sure that you get one single row per group, you can sort the values by "v2" before grouping and then just take the last row (the one with the highest v2-value).
df = pd.DataFrame({"c1": ["a", "a", "b", "b", "c"], "c2": ["a", "a", "a", "a", "d"], "v1": [1, 2, 3, 4, 5], "v2": [2, 3, 1, 5, 0]})
df.sort_values("v2").groupby(["c1", "c2"]).last().reset_index()
result:
c1 c2 v1 v2
0 a a 2 3
1 b a 4 5
2 c d 5 0