pandas get delta from corresponding date in a seperate list of dates

Question:

I have a dataframe

df a b
   7 2019-05-01 00:00:01
   6 2019-05-02 00:15:01 
   1 2019-05-06 00:10:01
   3 2019-05-09 01:00:01
   8 2019-05-09 04:20:01
   9 2019-05-12 01:10:01
   4 2019-05-16 03:30:01

And

l = [datetime.datetime(2019,05,02), datetime.datetime(2019,05,10), datetime.datetime(2019,05,22) ]

I want to add a column with the following:
for each row, find the last date from l that is before it, and add number of days between them.
If none of the date is smaller – add the delta from the smallest one.
So the new column will be:

df a b.                 delta            date
   7 2019-05-01 00:00:01 -1     datetime.datetime(2019,05,02)
   6 2019-05-02 00:15:01  0     datetime.datetime(2019,05,02)
   1 2019-05-06 00:10:01  4     datetime.datetime(2019,05,02)
   3 2019-05-09 01:00:01  7     datetime.datetime(2019,05,02)
   8 2019-05-09 04:20:01  7     datetime.datetime(2019,05,02)
   9 2019-05-12 01:10:01  2     datetime.datetime(2019,05,10)
   4 2019-05-16 03:30:01  6     datetime.datetime(2019,05,10)

How can I do it?

Thanks

Asked By: okuoub

||

Answers:

Using merge_asof to align df['b'] and the list (as Series), then computing the difference:

# ensure datetime
df['b'] = pd.to_datetime(df['b'])

# craft Series for merging (could be combined with line below)
s = pd.Series(l, name='l')

# merge and fillna with minimum date
ref = pd.merge_asof(df['b'], s, left_on='b', right_on='l')['l'].fillna(s.min())

# compute the delta as days
df['delta'] =(df['b']-ref).dt.days

output:

   a                   b  delta
0  7 2019-05-01 00:00:01     -1
1  6 2019-05-02 00:15:01      0
2  1 2019-05-06 00:10:01      4
3  3 2019-05-09 01:00:01      7
4  8 2019-05-09 04:20:01      7
5  9 2019-05-12 01:10:01      2
6  4 2019-05-16 03:30:01      6
Answered By: mozway

Here’s a one line solution if you your b column has datetime object. Otherwise convert it to datetime object.

df['delta'] = df.apply(lambda x: sorted([x.b - i for i in l], key= lambda y: y.seconds)[0].days, axis=1) 

Explanation : To each row you apply a function that :

  • Compute the deltatime between your row’s datetime and every datetime present in l, then store it in a list
  • Sort this list by the numbers of seconds of each deltatime
  • Get the first value (with the smallest deltatime) and return its days
Answered By: ArrowRise

this code is seperate this dataset on

  • weekday Friday
  • year 2014
  • day 01
  • hour 00
  • minute 03
rides['weekday'] = rides.timestamp.dt.strftime("%A")
rides['year'] = rides.timestamp.dt.strftime("%Y")
rides['day'] = rides.timestamp.dt.strftime("%d")
rides['hour'] = rides.timestamp.dt.strftime("%H")
rides["minute"] = rides.timestamp.dt.strftime("%M")
Answered By: shubham koli