get specific id after group by condition of time stamp difference

Question:

I am trying to take a time difference with in each group(id) and check for each group if ref1’s timestamp > ref2’s timestamp if true, then store those id’s in a list.

In the data frame example below for id -> a2,the ref1> ref2 hence the output list should contain [‘a2’]

Please help me in achieving the result

dataframe:

    id      reference  timestamp
    a1  ref1    2022-11-12 08:58:21
    a1  ref2    2022-11-12 08:58:26
    a1  ref3    2022-11-12 08:58:45
    a2  ref2    2022-11-12 08:58:21
    a2  ref1    2022-11-12 08:58:45
    a3  ref2    2022-11-12 08:58:21
    a2  ref3    2022-11-12 08:58:45

Dataframe code:

import pandas as pd
  
# initialize list of lists
data = [['a1','ref1', '2022-11-12 08:58:21'],['a1', 'ref2','2022-11-12 08:58:26'], ['a1', 'ref3','2022-11-12 08:58:45'],['a2', 'ref2','2022-11-12 08:58:21'],['a2', 'ref2','2022-11-12 08:58:40'], ['a2', 'ref1','2022-11-12 08:58:45'], ['a3','ref2', '2022-11-12 08:58:21'],['a2', 'ref3','2022-11-12 08:58:45']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'reference','timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df
Asked By: adss4

||

Answers:

You can use pandas indexing:

tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])

out = tmp.loc['ref1'] - tmp.loc['ref2']

out = set(out.index[out.gt('0')])

NB. If you have several identical dates for one of the references, it will compute all combinations. Here the set acts as a any.

Output:

{'a2'}

Or:

out = (tmp.loc['ref1'] - tmp.loc['ref2']
      ).loc[lambda x: x.gt('0')].index.unique().tolist()

Output: ['a2']

Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.