Filter and apply condition between multiple rows

Question:

I have the following dataframe:

client_id   location_id      region_name    location_name
1                123          Florida        location_ABC
6                123          Florida(P)     location_ABC
6                845          Miami(P)       location_THE
1                386          Boston         location_WOP
6                386          Boston(P)      location_WOP

What I’m trying to do is:

  • If some location_id has more than one client_id, I’ll pick the client_id == 1.
  • If some location_id has only one client_id, I’ll pick whatever row it is.

If we were implementing only one logic, it should be as simple as df[df['client_id'] == 1]. But I can not figure out how to perform this type of filtering that requires verifying more rows at the same time (figure out how to check if some location_id has more then one client_id, for example).

So, in this scenario, the resulting data frame would be:

client_id   location_id      region_name    location_name
1                123          Florida        location_ABC
6                845          Miami(P)       location_THE
1                386          Boston         location_WOP

Any ideas?

Asked By: bellotto

||

Answers:

You can use idxmax with a custom groupby on the boolean Series equal to your preferred id, then slice:

out = df.loc[df['client_id'].eq(1).groupby(df['location_id'], wort=False).idxmax()]

output:

   client_id  location_id region_name location_name
0          1          123     Florida  location_ABC
2          6          845    Miami(P)  location_THE
3          1          386      Boston  location_WOP
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.