Pandas DF – Efficient way to loop through DF to find minimum values of one column from rows with common values in another column

Question

I have a dataframe that looks something like:

matter	work_date
1	01/01/2020
2	01/02/2020
1	01/04/2020
2	01/05/2020

I want a new column which finds the minimum work_date of all rows with the same matter number so that I can do some time delta calculations. so the final result would look like this:

matter	work_date	first_date
1	01/01/2020	01/01/2020
2	01/02/2020	01/02/2020
1	01/04/2020	01/01/2020
2	01/05/2020	01/02/2020

Right now, I’m using the following code, but it is taking quite a while to run (the dataframe has approx 300k rows and I’m on an ancient computer).

min_dict = {}
def check_dict(val):
    return min_dict.setdefault(val,min(df[df['tmatter']==val]['tworkdt']))

df['first_day'] = df.apply (lambda row: check_dict(row.tmatter), axis = 1)

Is there a better way to approach this?

Asked By: zac

||

Source

Answer 1

transform does what you want and should be fast

The steps are (1) group the rows together that have the same matter (2) for each group calculate the minimum work_date and (3) save these values as a new column.

import pandas as pd
import io

df = pd.read_csv(io.StringIO("""
matter  work_date
1   01/01/2020
2   01/02/2020
1   01/04/2020
2   01/05/2020
"""), delim_whitespace=True)

df['first_date'] = df.groupby('matter')['work_date'].transform('min')
print(df)

Answered By: mitoRibo

Pandas DF – Efficient way to loop through DF to find minimum values of one column from rows with common values in another column

Question:

Answers: