Select rows in pandas datafame that satisfy a condition in a column that depends on a subset of rows
Question:
give the following dataframe I would like to select the rows for each user that has the highest amount:
Name Amount ID
--------------------
Alice 100 1
Bob 50 2
Charlie 10 3
Alice 30 4
Charlie 50 5
the result should be:
Name Amount ID
--------------------
Alice 100 1
Bob 50 2
Charlie 50 5
how can i do this efficiently?
Answers:
Use groupby.apply
:
df.groupby('Name', as_index=False).apply(lambda x: x.loc[x['Amount'].eq(x['Amount'].max())]).reset_index(drop=True)
You can use idxmax
:
df.loc[df.groupby('Name')['Amount'].idxmax()]
output:
Name Amount ID
0 Alice 100 1
1 Bob 50 2
4 Charlie 50 5
If you want to reset_index(), then just add it at the end like:
df.loc[df.groupby('Name')['Amount'].idxmax()].reset_index(drop=True)
output:
Name Amount ID
0 Alice 100 1
1 Bob 50 2
2 Charlie 50 5
give the following dataframe I would like to select the rows for each user that has the highest amount:
Name Amount ID
--------------------
Alice 100 1
Bob 50 2
Charlie 10 3
Alice 30 4
Charlie 50 5
the result should be:
Name Amount ID
--------------------
Alice 100 1
Bob 50 2
Charlie 50 5
how can i do this efficiently?
Use groupby.apply
:
df.groupby('Name', as_index=False).apply(lambda x: x.loc[x['Amount'].eq(x['Amount'].max())]).reset_index(drop=True)
You can use idxmax
:
df.loc[df.groupby('Name')['Amount'].idxmax()]
output:
Name Amount ID
0 Alice 100 1
1 Bob 50 2
4 Charlie 50 5
If you want to reset_index(), then just add it at the end like:
df.loc[df.groupby('Name')['Amount'].idxmax()].reset_index(drop=True)
output:
Name Amount ID
0 Alice 100 1
1 Bob 50 2
2 Charlie 50 5