Filter and apply multiple conditions between multiple rows

Question:

I have the following dataframe:

id  location     method
1      456        Phone
1      456        OS
6      456        OS
6      943        Specialist

What I’m trying to do, is to implement the following logic:

  • If there’s only one record (consider the combination of location + method), I’ll just do nothing. That’s the scenario for the first and last row.
  • If there’s more than one record (location + method), I want to keep only those where the ID == 1.

So, the resulting dataframe would be:

id  location     method
1      456        Phone
1      456        OS
6      943        Specialist

If I’m trying to only filter by the id column, I have this solution: df.loc[df['id'].eq(1).groupby(df['location'], sort=False).idxmax()] (Reference: Filter and apply condition between multiple rows)

But I can not figure out how to perform this filter combined with the "method" column. Any ideas?

Asked By: bellotto

||

Answers:

A possible solution:

(df.sort_values(by='id')
 .groupby(['location', 'method']).first()
 .reset_index().sort_index(axis=1))

Output:

   id  location      method
0   1       456          OS
1   1       456       Phone
2   6       943  Specialist
Answered By: PaulS

Firstly, you can use groupby on multiple columns (location and method). Then on the "grouped" dataframes, you can select one of the rows as you need (here I sort by "id", and select the first one).

# generate the dataframe
df = pd.DataFrame(
    [
        [1,      456,     "Phone"],
        [1,      456,     "OS"],
        [6,      456,     "OS"],
        [6 ,     943,     "Specialist"]
    ],
    columns=["id",  "location", "method"]
)

# groupby, sort, and select the first row
n_df = df.groupby(["location", "method"]).apply(lambda x:x.sort_values("id").iloc[0])

# remove the index that groupby generates
n_df.reset_index(drop=True, inplace=True)

n_df
    id  location    method
0   1   456 OS
1   1   456 Phone
2   6   943 Specialist
Answered By: Luce
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.