How can I check if a timestamp entry is within a time range and with a person filter in two different dataframe?

Question:

I need to check if an entry is within a person’s shift:

The data looks like this:

timestamp = pd.DataFrame({
    'Timestamp': ['01/02/2022 16:08:56','01/02/2022 16:23:31','01/02/2022 16:41:35','02/02/2022 16:57:41','02/02/2022 17:38:22','02/02/2022 17:50:56'],
    'Person': ['A','B','A','B','B','A']
})

shift = pd.DataFrame({
    'Date': ['01/02/2022','02/02/2022','01/02/2022','02/02/2022'],
    'in':['13:00:00','13:00:00','14:00:00','14:00:00'],
    'out': ['21:00:00','21:00:00','22:00:00','22:00:00'],
    'Person': ['A','A','B','B']
})
Asked By: Malace

||

Answers:

I assume that there is only one shift per person per day.

First I split the day and time from the timestamp dataframe. Then merge this with the shift dataframe on columns Person and Date. Then we only need to check whether the time from timestamp is between in and out.

timestamp[['Date', 'Time']] = timestamp.Timestamp.str.split(' ', 1, expand=True)

df_merge = timestamp.merge(shift, on=['Date', 'Person'])

df_merge['Timestamp_in_shift'] = (df_merge.Time <= df_merge.out) & (df_merge.Time >= df_merge['in'])

df_merge.drop(columns=['Date', 'Time'])

Output:

             Timestamp Person        in       out  Timestamp_in_shift
0  01/02/2022 16:08:56      A  13:00:00  21:00:00                True
1  01/02/2022 16:41:35      A  13:00:00  21:00:00                True
2  01/02/2022 16:23:31      B  14:00:00  22:00:00                True
3  02/02/2022 16:57:41      B  14:00:00  22:00:00                True
4  02/02/2022 17:38:22      B  14:00:00  22:00:00                True
5  02/02/2022 17:50:56      A  13:00:00  21:00:00                True
Answered By: Tobias Molenaar

For this kind of merge, an efficient method is to use merge_asof:

timestamp['Timestamp'] = pd.to_datetime(timestamp['Timestamp'])

(pd.merge_asof(timestamp.sort_values(by='Timestamp'),
               shift.assign(Timestamp=pd.to_datetime(shift['Date']+' '+shift['in']),
                            ts_out=pd.to_datetime(shift['Date']+' '+shift['out']),
                            ).sort_values(by='Timestamp')
                    [['Person', 'Timestamp', 'ts_out']],
               on='Timestamp', by='Person'
              )
   .assign(in_shift=lambda d: d['ts_out'].ge(d['Timestamp']))
   .drop(columns=['ts_out']) 
)

output:

            Timestamp Person  in_shift
0 2022-01-02 16:08:56      A      True
1 2022-01-02 16:23:31      B      True
2 2022-01-02 16:41:35      A      True
3 2022-02-02 16:57:41      B      True
4 2022-02-02 17:38:22      B      True
5 2022-02-02 17:50:56      A      True
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.