Find events in log that occur after a specific event
Question:
I have a log of events to analyze that looks like this:
+----+---------------------+----------+--------+
| id | timestamp | record | event |
+====+=====================+==========+========+
| 1 | 2023-03-01 13:17:05 | record03 | Edit |
+----+---------------------+----------+--------+
| 2 | 2023-03-02 02:57:49 | record02 | Edit |
+----+---------------------+----------+--------+
| 3 | 2023-03-03 00:41:13 | record03 | Locked |
+----+---------------------+----------+--------+
| 4 | 2023-03-03 14:54:34 | record03 | View |
+----+---------------------+----------+--------+
| 5 | 2023-03-04 07:29:55 | record03 | Edit |
+----+---------------------+----------+--------+
| 6 | 2023-03-05 02:15:10 | record02 | Locked |
+----+---------------------+----------+--------+
| 7 | 2023-03-05 04:47:33 | record01 | View |
+----+---------------------+----------+--------+
| 8 | 2023-03-05 15:39:04 | record02 | View |
+----+---------------------+----------+--------+
| 9 | 2023-03-06 08:36:22 | record03 | View |
+----+---------------------+----------+--------+
| 10 | 2023-03-06 18:37:28 | record02 | View |
+----+---------------------+----------+--------+
What I’m looking for is any "Edit" event that occurs after a "Locked" event for a given record. For each record, any event that occurred prior to a "Locked" event can be ignored. Any "Edit" event that occurs after the "Locked" event must be reported.
For example, in the sample data above, only row 5 should be returned as it has an "Edit" event after a "Locked" event. If everything is working properly, there shouldn’t be any "Edit" events after a "Locked" event in the log. Any method that identifies row 5 among a list of results would be acceptable.
I’ve been trying to use groupby() and first(), but I’m struggling to figure out how to return the first occurrence of "Edit" for a given record after any occurrence of "Locked".
Thanks in advance for any tip!
Answers:
Code
# is the event locked?
is_locked = df['event'] == 'Locked'
# flag all rows per record after the first locked event
is_locked = is_locked.groupby(df['record']).cummax()
# Row is flagged and event is edit
mask = df['event'].eq('Edit') & is_locked
# filter all such rows
df[mask]
Result
id timestamp record event
4 5 2023-03-04 07:29:55 record03 Edit
I have a log of events to analyze that looks like this:
+----+---------------------+----------+--------+
| id | timestamp | record | event |
+====+=====================+==========+========+
| 1 | 2023-03-01 13:17:05 | record03 | Edit |
+----+---------------------+----------+--------+
| 2 | 2023-03-02 02:57:49 | record02 | Edit |
+----+---------------------+----------+--------+
| 3 | 2023-03-03 00:41:13 | record03 | Locked |
+----+---------------------+----------+--------+
| 4 | 2023-03-03 14:54:34 | record03 | View |
+----+---------------------+----------+--------+
| 5 | 2023-03-04 07:29:55 | record03 | Edit |
+----+---------------------+----------+--------+
| 6 | 2023-03-05 02:15:10 | record02 | Locked |
+----+---------------------+----------+--------+
| 7 | 2023-03-05 04:47:33 | record01 | View |
+----+---------------------+----------+--------+
| 8 | 2023-03-05 15:39:04 | record02 | View |
+----+---------------------+----------+--------+
| 9 | 2023-03-06 08:36:22 | record03 | View |
+----+---------------------+----------+--------+
| 10 | 2023-03-06 18:37:28 | record02 | View |
+----+---------------------+----------+--------+
What I’m looking for is any "Edit" event that occurs after a "Locked" event for a given record. For each record, any event that occurred prior to a "Locked" event can be ignored. Any "Edit" event that occurs after the "Locked" event must be reported.
For example, in the sample data above, only row 5 should be returned as it has an "Edit" event after a "Locked" event. If everything is working properly, there shouldn’t be any "Edit" events after a "Locked" event in the log. Any method that identifies row 5 among a list of results would be acceptable.
I’ve been trying to use groupby() and first(), but I’m struggling to figure out how to return the first occurrence of "Edit" for a given record after any occurrence of "Locked".
Thanks in advance for any tip!
Code
# is the event locked?
is_locked = df['event'] == 'Locked'
# flag all rows per record after the first locked event
is_locked = is_locked.groupby(df['record']).cummax()
# Row is flagged and event is edit
mask = df['event'].eq('Edit') & is_locked
# filter all such rows
df[mask]
Result
id timestamp record event
4 5 2023-03-04 07:29:55 record03 Edit