Find minimum date that's greater than dates within a column

Question:

I have a dataframe of stating date and another list of dates. For each date within start_date df, I want the find the minimum date that’s greater than it from the date_list df.

Desired output shown below.

start_date=pd.DataFrame( ['2014-04-05',
        '2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09'],columns=['start_date'])

start_date['start_date']=pd.to_datetime(start_date['start_date'])

date_list=pd.DataFrame( [
        '2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09','2014-04-20'],columns=['date_list'])
date_list['date_list']=pd.to_datetime(date_list['date_list'])

desired_output= pd.DataFrame( {'start_date':['2014-04-05',
        '2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09'],
        'end_date':['2014-04-06',
        '2014-04-07',
        '2014-04-08',
        '2014-04-09','2014-04-20']})
Asked By: Felton Wang

||

Answers:

One way is to just take all the dates that are larger than start date, then take the minimum of those:

out=start_date.copy()
out['end_date']=out['start_date'].apply(lambda x: date_list[date_list['date_list'].gt(x), 'date_list'].min())

OUTPUT

  start_date   end_date
0 2014-04-05 2014-04-06
1 2014-04-06 2014-04-07
2 2014-04-07 2014-04-08
3 2014-04-08 2014-04-09
4 2014-04-09 2014-04-20

For convenience, you can just use the underlying numpy array or create numpy array for the date list, then apply masking + minimum:

arr = date_list['date_list'].values
out['end_date']=out['start_date'].apply(lambda x: arr[arr>x].min())
Answered By: ThePyGuy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.