A loop to find min values based on another column value, and merge into 1 dataframe?
Question:
Imagine a dataframe like this:
import pandas as pd
list ={'Security ID':['3e09ax', 'we9lkl', 'as42we','as5322', 'ot24tas', 'c34ci46a8'],
'Industry':['Airplanes', 'Airplanes', 'Oil', 'Oil', 'Housing', 'Trucking'],
'Amount outstanding':[33, 31, 39, 21, 29, 29]}
df = pd.DataFrame(list)
”’
The end goal is to return rows for each Industry for the lowest (min) Amount Outstanding into a "Min Value" dataframe for a daily report
essentially this but for each industry:
df[df['Amount outstanding'] == df['Amount outstanding'].min()]
First step is get Unique Values for [Industry] into a list again and then generate a loop function that does this.
Not sure exactly how to this. This dataframe in reality is 100,000 rows with 30 industries that change daily.
Answers:
Concise and performant; transform
and apply
can be unsuitably slow especially for large dataframes.
df.loc[df.groupby('Industry')['Amount outstanding'].idxmin()]
Broken down into parts:
df.groupby('Industry') # for each industry
['Amount outstanding'].idxmin() # get index of minimum 'Amount outstanding'
df.loc[ ]# return those rows of the original dataframe
Note well: make sure dtype
of 'Amount outstanding'
column is a numeric; if it is object
, it will be orders-of-magnitude slower than necessary. I ran into something similar dealing with a resample
on a dataframe of 1.9M rows, 30 cols; code that ran for minutes without completing finished in seconds after converting the columns from object
to float
.
IIUC, you want groupby
and transform
:
output = df[df['Amount outstanding']==df.groupby('Industry')['Amount outstanding'].transform(min)]
>>> output
Security ID Industry Amount outstanding
1 we9lkl Airplanes 31
3 as5322 Oil 21
4 ot24tas Housing 29
5 c34ci46a8 Trucking 29
df['B']=df['Amount outstanding']
df.groupby('Industry', group_keys=False).apply(lambda x: x.loc[x.B.idxmin()])
this gives you a new dataframe only with min value in column Amount outstanding. If you want you can drop now the column ‘B’
Imagine a dataframe like this:
import pandas as pd
list ={'Security ID':['3e09ax', 'we9lkl', 'as42we','as5322', 'ot24tas', 'c34ci46a8'],
'Industry':['Airplanes', 'Airplanes', 'Oil', 'Oil', 'Housing', 'Trucking'],
'Amount outstanding':[33, 31, 39, 21, 29, 29]}
df = pd.DataFrame(list)
”’
The end goal is to return rows for each Industry for the lowest (min) Amount Outstanding into a "Min Value" dataframe for a daily report
essentially this but for each industry:
df[df['Amount outstanding'] == df['Amount outstanding'].min()]
First step is get Unique Values for [Industry] into a list again and then generate a loop function that does this.
Not sure exactly how to this. This dataframe in reality is 100,000 rows with 30 industries that change daily.
Concise and performant; transform
and apply
can be unsuitably slow especially for large dataframes.
df.loc[df.groupby('Industry')['Amount outstanding'].idxmin()]
Broken down into parts:
df.groupby('Industry') # for each industry
['Amount outstanding'].idxmin() # get index of minimum 'Amount outstanding'
df.loc[ ]# return those rows of the original dataframe
Note well: make sure dtype
of 'Amount outstanding'
column is a numeric; if it is object
, it will be orders-of-magnitude slower than necessary. I ran into something similar dealing with a resample
on a dataframe of 1.9M rows, 30 cols; code that ran for minutes without completing finished in seconds after converting the columns from object
to float
.
IIUC, you want groupby
and transform
:
output = df[df['Amount outstanding']==df.groupby('Industry')['Amount outstanding'].transform(min)]
>>> output
Security ID Industry Amount outstanding
1 we9lkl Airplanes 31
3 as5322 Oil 21
4 ot24tas Housing 29
5 c34ci46a8 Trucking 29
df['B']=df['Amount outstanding']
df.groupby('Industry', group_keys=False).apply(lambda x: x.loc[x.B.idxmin()])
this gives you a new dataframe only with min value in column Amount outstanding. If you want you can drop now the column ‘B’