get specific id after group by condition of time stamp difference
Question:
I am trying to take a time difference with in each group(id) and check for each group if ref1’s timestamp > ref2’s timestamp if true, then store those id’s in a list.
In the data frame example below for id -> a2,the ref1> ref2 hence the output list should contain [‘a2’]
Please help me in achieving the result
dataframe:
id reference timestamp
a1 ref1 2022-11-12 08:58:21
a1 ref2 2022-11-12 08:58:26
a1 ref3 2022-11-12 08:58:45
a2 ref2 2022-11-12 08:58:21
a2 ref1 2022-11-12 08:58:45
a3 ref2 2022-11-12 08:58:21
a2 ref3 2022-11-12 08:58:45
Dataframe code:
import pandas as pd
# initialize list of lists
data = [['a1','ref1', '2022-11-12 08:58:21'],['a1', 'ref2','2022-11-12 08:58:26'], ['a1', 'ref3','2022-11-12 08:58:45'],['a2', 'ref2','2022-11-12 08:58:21'],['a2', 'ref2','2022-11-12 08:58:40'], ['a2', 'ref1','2022-11-12 08:58:45'], ['a3','ref2', '2022-11-12 08:58:21'],['a2', 'ref3','2022-11-12 08:58:45']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'reference','timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df
Answers:
You can use pandas indexing:
tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])
out = tmp.loc['ref1'] - tmp.loc['ref2']
out = set(out.index[out.gt('0')])
NB. If you have several identical dates for one of the references, it will compute all combinations. Here the set acts as a any
.
Output:
{'a2'}
Or:
out = (tmp.loc['ref1'] - tmp.loc['ref2']
).loc[lambda x: x.gt('0')].index.unique().tolist()
Output: ['a2']
I am trying to take a time difference with in each group(id) and check for each group if ref1’s timestamp > ref2’s timestamp if true, then store those id’s in a list.
In the data frame example below for id -> a2,the ref1> ref2 hence the output list should contain [‘a2’]
Please help me in achieving the result
dataframe:
id reference timestamp
a1 ref1 2022-11-12 08:58:21
a1 ref2 2022-11-12 08:58:26
a1 ref3 2022-11-12 08:58:45
a2 ref2 2022-11-12 08:58:21
a2 ref1 2022-11-12 08:58:45
a3 ref2 2022-11-12 08:58:21
a2 ref3 2022-11-12 08:58:45
Dataframe code:
import pandas as pd
# initialize list of lists
data = [['a1','ref1', '2022-11-12 08:58:21'],['a1', 'ref2','2022-11-12 08:58:26'], ['a1', 'ref3','2022-11-12 08:58:45'],['a2', 'ref2','2022-11-12 08:58:21'],['a2', 'ref2','2022-11-12 08:58:40'], ['a2', 'ref1','2022-11-12 08:58:45'], ['a3','ref2', '2022-11-12 08:58:21'],['a2', 'ref3','2022-11-12 08:58:45']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'reference','timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df
You can use pandas indexing:
tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])
out = tmp.loc['ref1'] - tmp.loc['ref2']
out = set(out.index[out.gt('0')])
NB. If you have several identical dates for one of the references, it will compute all combinations. Here the set acts as a any
.
Output:
{'a2'}
Or:
out = (tmp.loc['ref1'] - tmp.loc['ref2']
).loc[lambda x: x.gt('0')].index.unique().tolist()
Output: ['a2']