Multiple conditions Pandas groupby, keeping other column values

Question:

I have a dataframe like this:

Launch  Article Sequence    Machine     Quantity    Date        …
68033   F2500   10          lathe 1     200         01/02/2022  …
68033   F2500   20          lathe 1     190         01/02/2022  …
68033   F2500   30          borer 3     175         02/02/2022  …
68033   F2500   40          milling 1   175         03/03/2022  …
71562   F2500   10          lathe 3     632         12/12/2022  …
71562   F2500   20          lathe 4     593         15/12/2022  …
71562   F2500   30          borer 3     560         16/12/2022  …
71562   F2500   40          milling 2   555         16/12/2022  …
69872   F302    10          lathe 2     5463        04/06/2022  …
69872   F302    30          lathe 3     5102        11/06/2022  …
70444   F302    20          lathe 1     3125        27/07/2022  …
70444   F302    30          lathe 3     2965        31/07/2022  …
…       …       …           …           …           …           …

124.531 rows x 12 columns

What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.

The end result should look like this:

Article Launch  Sequence    Machine
F2500   71562   10          lathe 3
F302    70444   20          lathe 1
…       …       …           …

I’ve tried to do it with pandas groupby with .agg, but it doesn’t work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I’ve tried some other approaches with sort_values and such, but with no success.

Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})
Asked By: Piazza

||

Answers:

I would use:

# get max Launch per Article and filter rows
m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]

# get rows with min sequence
Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]

Output:

    Launch Article  Sequence  Machine  Quantity        Date
4    71562   F2500        10  lathe 3       632  12/12/2022
10   70444    F302        20  lathe 1      3125  27/07/2022
Answered By: mozway

In straightforward way:

df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
                            .sort_values(by=['Sequence']).head(1)).reset_index(drop=True)

  Launch Article  Sequence  Machine  Quantity        Date
0   71562   F2500        10  lathe 3       632  12/12/2022
1   70444    F302        20  lathe 1      3125  27/07/2022
Answered By: RomanPerekhrest
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.