Multiple conditions Pandas groupby, keeping other column values
Question:
I have a dataframe like this:
Launch Article Sequence Machine Quantity Date …
68033 F2500 10 lathe 1 200 01/02/2022 …
68033 F2500 20 lathe 1 190 01/02/2022 …
68033 F2500 30 borer 3 175 02/02/2022 …
68033 F2500 40 milling 1 175 03/03/2022 …
71562 F2500 10 lathe 3 632 12/12/2022 …
71562 F2500 20 lathe 4 593 15/12/2022 …
71562 F2500 30 borer 3 560 16/12/2022 …
71562 F2500 40 milling 2 555 16/12/2022 …
69872 F302 10 lathe 2 5463 04/06/2022 …
69872 F302 30 lathe 3 5102 11/06/2022 …
70444 F302 20 lathe 1 3125 27/07/2022 …
70444 F302 30 lathe 3 2965 31/07/2022 …
… … … … … … …
124.531 rows x 12 columns
What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.
The end result should look like this:
Article Launch Sequence Machine
F2500 71562 10 lathe 3
F302 70444 20 lathe 1
… … … …
I’ve tried to do it with pandas groupby with .agg, but it doesn’t work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I’ve tried some other approaches with sort_values and such, but with no success.
Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})
Answers:
I would use:
# get max Launch per Article and filter rows
m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]
# get rows with min sequence
Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]
Output:
Launch Article Sequence Machine Quantity Date
4 71562 F2500 10 lathe 3 632 12/12/2022
10 70444 F302 20 lathe 1 3125 27/07/2022
In straightforward way:
df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
.sort_values(by=['Sequence']).head(1)).reset_index(drop=True)
Launch Article Sequence Machine Quantity Date
0 71562 F2500 10 lathe 3 632 12/12/2022
1 70444 F302 20 lathe 1 3125 27/07/2022
I have a dataframe like this:
Launch Article Sequence Machine Quantity Date …
68033 F2500 10 lathe 1 200 01/02/2022 …
68033 F2500 20 lathe 1 190 01/02/2022 …
68033 F2500 30 borer 3 175 02/02/2022 …
68033 F2500 40 milling 1 175 03/03/2022 …
71562 F2500 10 lathe 3 632 12/12/2022 …
71562 F2500 20 lathe 4 593 15/12/2022 …
71562 F2500 30 borer 3 560 16/12/2022 …
71562 F2500 40 milling 2 555 16/12/2022 …
69872 F302 10 lathe 2 5463 04/06/2022 …
69872 F302 30 lathe 3 5102 11/06/2022 …
70444 F302 20 lathe 1 3125 27/07/2022 …
70444 F302 30 lathe 3 2965 31/07/2022 …
… … … … … … …
124.531 rows x 12 columns
What i need to do is a some kind of group by where, for each article i select the maximum launch number, and after that, the minimum sequence number with its relative machine.
The end result should look like this:
Article Launch Sequence Machine
F2500 71562 10 lathe 3
F302 70444 20 lathe 1
… … … …
I’ve tried to do it with pandas groupby with .agg, but it doesn’t work. The following code, for example, gives me the max launch and min sequence overall and not the min sequence related to the max launch. I’ve tried some other approaches with sort_values and such, but with no success.
Last_Lathe_df = Last_Lathe_df.groupby(['Article'], as_index=False).agg({'Launch': 'max', 'Sequence': 'min', 'Machine': 'first'})
I would use:
# get max Launch per Article and filter rows
m = df.groupby('Article')['Launch'].max()
df2 = df.loc[df['Launch'].isin(m)]
# get rows with min sequence
Last_Lathe_df = df2.loc[df2.groupby('Article')['Sequence'].idxmin()]
Output:
Launch Article Sequence Machine Quantity Date
4 71562 F2500 10 lathe 3 632 12/12/2022
10 70444 F302 20 lathe 1 3125 27/07/2022
In straightforward way:
df.groupby('Article').apply(lambda x: x[x['Launch'].eq(x['Launch'].max())]
.sort_values(by=['Sequence']).head(1)).reset_index(drop=True)
Launch Article Sequence Machine Quantity Date
0 71562 F2500 10 lathe 3 632 12/12/2022
1 70444 F302 20 lathe 1 3125 27/07/2022