Trouble shooting pd.to_timedelta calculation failure

Question:

I have previously used the following conditional statement successfully

for RxDat in df2:

       condition = (df['Tdate'] > RxDat - pd.to_timedelta(46, unit="D")) & (df['Tdate'] < RxDat)

Now I am getting the following error

TypeError: unsupported operand type(s) for -: ‘str’ and ‘Timedelta’

I have extracted the following data to illustrate the error

df['Tdate'] contains

[Timestamp('2004-08-25 00:00:00'), Timestamp('2004-10-13 00:00:00'), Timestamp('2004-12-13 00:00:00'), Timestamp('2005-02-21 00:00:00'), Timestamp('2005-04-28 00:00:00'), Timestamp('2005-08-24 00:00:00')]

df2['RxDate'] contains

[Timestamp('2004-08-20 00:00:00'), Timestamp('2004-08-23 00:00:00'), Timestamp('2004-08-18 00:00:00'), Timestamp('2004-08-15 00:00:00'), Timestamp('2004-08-12 00:00:00'), Timestamp('2004-08-13 00:00:00')]

I have tried looking at this a few ways and cannot see why I get the error?

Asked By: JohnH

||

Answers:

If loop by d2 then RxDat are columns names:

for RxDat in df2:

Use:

for RxDat in df2['RxDate']:

Non loop solution with broadcasting, output is 2d numpy array:

a = df['Tdate'].to_numpy()[:, None]
b = df2['RxDate'].sub(pd.to_timedelta(46, unit="D")).to_numpy()
c = df2['RxDate'].to_numpy()
              
condition = (a > b) & (a < c)
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.