From specific row in a df_a, to count its occurrences in the past a year in df_b

Question:

I have two dataframe as below and I want to return how many Success (Yes) in a year (for a specific person) 1 year prior to his/her specific date, i.e. each entry in to check to define the range in history.

For example, in to_check, Mike 20200602, I want to know how many Success (Yes) in Mike’s history (1 year before, until 20200602).

enter image description here

By using the "to_check" as a list, I came up with a clumsy way:

import pandas as pd
import datetime
import numpy as np
from io import StringIO
import time
from datetime import datetime, date, time, timedelta


csvfile = StringIO("""
Name Check
Mike 20200602
David 20210415
Kate 20201109""")

csvfile_1 = StringIO("""
Name History Success
David 20180312 Yes
David 20180811 Yes
David 20191223 Yes
David 20210311 Yes
Kate 20180906 Yes
Kate 20180912 Yes
Kate 20191204 Yes
Kate 20200505 Yes
Mike 20180912 Yes
Mike 20190312 Yes
Mike 20190806 Yes
Mike 20191204 Yes""")


df_check = pd.read_csv(csvfile, sep = ' ', engine='python')
df_history = pd.read_csv(csvfile_1, sep = ' ', engine='python')

df_history['Date'] = pd.to_datetime(df_history['History'], format='%Y%m%d')

to_check = ["Mike 20200602","David 20210415","Kate 20201109"]

for t in to_check:
    name, d = t.split(" ")
    date_obj = datetime.strptime(d, '%Y%m%d')
    delta = timedelta(days = 365)
    day_before = date_obj - delta
    m1 = df_history['Name'] == name
    m2 = df_history['Date'] >= day_before

    df_history['OP'] = np.where(m1 & m2, "fit", '')

    how_many = df_history['OP'].value_counts().tolist()[1]

    print (t, how_many)

Output:

Mike 20200602 2
David 20210415 1
Kate 20201109 2

What’s the better and smarter way to achieve it?

Asked By: Mark K

||

Answers:

merge and query, but I would suggest leaving the dates as number for easy offset:

# both `Check` and `History` are numbers, not dates
(df_check.merge(df_history, on='Name', how='left')
    .query('History<=Check<History+10000')
    .groupby('Name').agg({'History':'first', 'Success':'size'})
)

Output:

        History  Success
Name                    
David  20210311        1
Kate   20191204        2
Mike   20190806        2
Answered By: Quang Hoang
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.