Pandas DF – Efficient way to loop through DF to find minimum values of one column from rows with common values in another column

Question:

I have a dataframe that looks something like:

matter work_date
1 01/01/2020
2 01/02/2020
1 01/04/2020
2 01/05/2020

I want a new column which finds the minimum work_date of all rows with the same matter number so that I can do some time delta calculations. so the final result would look like this:

matter work_date first_date
1 01/01/2020 01/01/2020
2 01/02/2020 01/02/2020
1 01/04/2020 01/01/2020
2 01/05/2020 01/02/2020

Right now, I’m using the following code, but it is taking quite a while to run (the dataframe has approx 300k rows and I’m on an ancient computer).

min_dict = {}
def check_dict(val):
    return min_dict.setdefault(val,min(df[df['tmatter']==val]['tworkdt']))

df['first_day'] = df.apply (lambda row: check_dict(row.tmatter), axis = 1)

Is there a better way to approach this?

Asked By: zac

||

Answers:

transform does what you want and should be fast

The steps are (1) group the rows together that have the same matter (2) for each group calculate the minimum work_date and (3) save these values as a new column.

import pandas as pd
import io

df = pd.read_csv(io.StringIO("""
matter  work_date
1   01/01/2020
2   01/02/2020
1   01/04/2020
2   01/05/2020
"""), delim_whitespace=True)

df['first_date'] = df.groupby('matter')['work_date'].transform('min')
print(df)

enter image description here

Answered By: mitoRibo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.