Filter records based on timestamp in pandas dataframe
Question:
I have a pandas dataframe as below:
id
location
timestamp
001
A
2022-01-01 17:00:00
001
B
2022-01-01 18:00:00
001
B
2022-01-01 18:15:00
002
B
2022-01-01 18:30:00
003
B
2022-01-01 19:00:00
003
A
2022-01-01 20:00:00
I’d like to filter records where location is B and only after the same id has visited location A (i.e. the timestamp of B is later than the timestamp of A). Desired output as below:
id
location
timestamp
001
B
2022-01-01 18:00:00
001
B
2022-01-01 18:15:00
Thank you for your help!
Answers:
You can simply compare the dates via python comparators as shown here:
filtered_records = records[records["timestamp"] > compared_record["timestamp"]]
The Location can be filteres in the same way:
filtered_records = records[records["location"] == compared_record["location"]]
So your final code would be:
filtered_records = records[(records["timestamp"] > compared_record["timestamp"]) && (records["location"] == compared_record["location"])]
You can use boolean indexing with help of groupby.cummax
:
# identify all times of a visit to A and after
m1 = df.sort_values('timestamp')['location'].eq('A').groupby(df['id']).cummax()
# identify location B
m2 = df['location'].eq('B')
# keep rows where both conditions above are True
df[m1&m2]
output:
id location timestamp
1 1 B 2022-01-01 18:00:00
2 1 B 2022-01-01 18:15:00
def function1(dd:pd.DataFrame):
ss1=dd.sort_values(["timestamp","location"])[::-1].location.eq("B").cumprod()
if ss1.loc[lambda x_x==0].size>0:
return dd.loc[ss1.loc[lambda x_x==1].index]
df1.groupby('id').apply(function1).reset_index(drop=True)
out:
id location timestamp
0 1 B 2022-01-01 18:15:00
1 1 B 2022-01-01 18:00:00
I have a pandas dataframe as below:
id | location | timestamp |
---|---|---|
001 | A | 2022-01-01 17:00:00 |
001 | B | 2022-01-01 18:00:00 |
001 | B | 2022-01-01 18:15:00 |
002 | B | 2022-01-01 18:30:00 |
003 | B | 2022-01-01 19:00:00 |
003 | A | 2022-01-01 20:00:00 |
I’d like to filter records where location is B and only after the same id has visited location A (i.e. the timestamp of B is later than the timestamp of A). Desired output as below:
id | location | timestamp |
---|---|---|
001 | B | 2022-01-01 18:00:00 |
001 | B | 2022-01-01 18:15:00 |
Thank you for your help!
You can simply compare the dates via python comparators as shown here:
filtered_records = records[records["timestamp"] > compared_record["timestamp"]]
The Location can be filteres in the same way:
filtered_records = records[records["location"] == compared_record["location"]]
So your final code would be:
filtered_records = records[(records["timestamp"] > compared_record["timestamp"]) && (records["location"] == compared_record["location"])]
You can use boolean indexing with help of groupby.cummax
:
# identify all times of a visit to A and after
m1 = df.sort_values('timestamp')['location'].eq('A').groupby(df['id']).cummax()
# identify location B
m2 = df['location'].eq('B')
# keep rows where both conditions above are True
df[m1&m2]
output:
id location timestamp
1 1 B 2022-01-01 18:00:00
2 1 B 2022-01-01 18:15:00
def function1(dd:pd.DataFrame):
ss1=dd.sort_values(["timestamp","location"])[::-1].location.eq("B").cumprod()
if ss1.loc[lambda x_x==0].size>0:
return dd.loc[ss1.loc[lambda x_x==1].index]
df1.groupby('id').apply(function1).reset_index(drop=True)
out:
id location timestamp
0 1 B 2022-01-01 18:15:00
1 1 B 2022-01-01 18:00:00