A loop to find min values based on another column value, and merge into 1 dataframe?

Question:

Imagine a dataframe like this:

import pandas as pd
list ={'Security ID':['3e09ax', 'we9lkl', 'as42we','as5322', 'ot24tas', 'c34ci46a8'],
          'Industry':['Airplanes', 'Airplanes', 'Oil', 'Oil', 'Housing', 'Trucking'],
          'Amount outstanding':[33, 31, 39, 21, 29, 29]}
df = pd.DataFrame(list)

”’

The end goal is to return rows for each Industry for the lowest (min) Amount Outstanding into a "Min Value" dataframe for a daily report

essentially this but for each industry:

df[df['Amount outstanding'] == df['Amount outstanding'].min()]

First step is get Unique Values for [Industry] into a list again and then generate a loop function that does this.

Not sure exactly how to this. This dataframe in reality is 100,000 rows with 30 industries that change daily.

Asked By: Sumit

||

Answers:

Concise and performant; transform and apply can be unsuitably slow especially for large dataframes.

df.loc[df.groupby('Industry')['Amount outstanding'].idxmin()]

Broken down into parts:

       df.groupby('Industry')                                # for each industry
                             ['Amount outstanding'].idxmin() # get index of minimum 'Amount outstanding' 
df.loc[                                                     ]# return those rows of the original dataframe

Note well: make sure dtype of 'Amount outstanding' column is a numeric; if it is object, it will be orders-of-magnitude slower than necessary. I ran into something similar dealing with a resample on a dataframe of 1.9M rows, 30 cols; code that ran for minutes without completing finished in seconds after converting the columns from object to float.

Answered By: Joshua Voskamp

IIUC, you want groupby and transform:

output = df[df['Amount outstanding']==df.groupby('Industry')['Amount outstanding'].transform(min)]

>>> output

  Security ID   Industry  Amount outstanding
1      we9lkl  Airplanes                  31
3      as5322        Oil                  21
4     ot24tas    Housing                  29
5   c34ci46a8   Trucking                  29
Answered By: not_speshal
df['B']=df['Amount outstanding'] 
df.groupby('Industry', group_keys=False).apply(lambda x: x.loc[x.B.idxmin()])

this gives you a new dataframe only with min value in column Amount outstanding. If you want you can drop now the column ‘B’

Answered By: Paulo Barbosa
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.