Grouping rows by difference of time
Question:
I have column with datetime format:
d = {'dttm' : ['2023-01-30 09:02:42','2023-01-30 09:02:47','2023-01-30 09:02:48','2023-01-30 09:02:59',
'2023-01-30 09:10:23','2023-01-30 09:10:27','2023-01-30 09:12:36','2023-01-30 09:14:13',
'2023-01-30 09:14:15','2023-01-30 09:16:12','2023-01-30 09:16:12','2023-01-30 09:16:13']}
dt_data = pd.DataFrame(d)
2023-01-30 09:02:42
2023-01-30 09:02:47
2023-01-30 09:02:48
2023-01-30 09:02:59
2023-01-30 09:10:23
2023-01-30 09:10:27
2023-01-30 09:12:36
2023-01-30 09:14:13
2023-01-30 09:14:15
2023-01-30 09:16:12
2023-01-30 09:16:12
2023-01-30 09:16:13
I need to group them in an interval of 5 seconds.
In other words, if the difference with the date and time in the row below is less than or equal to five seconds, the row is included in the previous group. Otherwise, you have the next group number.
Output:
dttm
group
2023-01-30 09:02:42
1
2023-01-30 09:02:47
1
2023-01-30 09:02:48
1
2023-01-30 09:02:59
2
2023-01-30 09:10:23
3
2023-01-30 09:10:27
3
2023-01-30 09:12:36
4
2023-01-30 09:14:13
5
2023-01-30 09:14:15
5
2023-01-30 09:16:12
6
2023-01-30 09:16:12
6
2023-01-30 09:16:13
6
Answers:
convert the "dttm" column with to_datetime
, and compute the diff
then compare to a reference of "5s" and increment the group with cumsum
when the value is above threshold:
dt_data['group'] = pd.to_datetime(dt_data['dttm']).diff().gt('5s').cumsum().add(1)
Output:
dttm group
0 2023-01-30 09:02:42 1
1 2023-01-30 09:02:47 1
2 2023-01-30 09:02:48 1
3 2023-01-30 09:02:59 2
4 2023-01-30 09:10:23 3
5 2023-01-30 09:10:27 3
6 2023-01-30 09:12:36 4
7 2023-01-30 09:14:13 5
8 2023-01-30 09:14:15 5
9 2023-01-30 09:16:12 6
10 2023-01-30 09:16:12 6
11 2023-01-30 09:16:13 6
I have column with datetime format:
d = {'dttm' : ['2023-01-30 09:02:42','2023-01-30 09:02:47','2023-01-30 09:02:48','2023-01-30 09:02:59',
'2023-01-30 09:10:23','2023-01-30 09:10:27','2023-01-30 09:12:36','2023-01-30 09:14:13',
'2023-01-30 09:14:15','2023-01-30 09:16:12','2023-01-30 09:16:12','2023-01-30 09:16:13']}
dt_data = pd.DataFrame(d)
2023-01-30 09:02:42
2023-01-30 09:02:47
2023-01-30 09:02:48
2023-01-30 09:02:59
2023-01-30 09:10:23
2023-01-30 09:10:27
2023-01-30 09:12:36
2023-01-30 09:14:13
2023-01-30 09:14:15
2023-01-30 09:16:12
2023-01-30 09:16:12
2023-01-30 09:16:13
I need to group them in an interval of 5 seconds.
In other words, if the difference with the date and time in the row below is less than or equal to five seconds, the row is included in the previous group. Otherwise, you have the next group number.
Output:
dttm | group |
---|---|
2023-01-30 09:02:42 | 1 |
2023-01-30 09:02:47 | 1 |
2023-01-30 09:02:48 | 1 |
2023-01-30 09:02:59 | 2 |
2023-01-30 09:10:23 | 3 |
2023-01-30 09:10:27 | 3 |
2023-01-30 09:12:36 | 4 |
2023-01-30 09:14:13 | 5 |
2023-01-30 09:14:15 | 5 |
2023-01-30 09:16:12 | 6 |
2023-01-30 09:16:12 | 6 |
2023-01-30 09:16:13 | 6 |
convert the "dttm" column with to_datetime
, and compute the diff
then compare to a reference of "5s" and increment the group with cumsum
when the value is above threshold:
dt_data['group'] = pd.to_datetime(dt_data['dttm']).diff().gt('5s').cumsum().add(1)
Output:
dttm group
0 2023-01-30 09:02:42 1
1 2023-01-30 09:02:47 1
2 2023-01-30 09:02:48 1
3 2023-01-30 09:02:59 2
4 2023-01-30 09:10:23 3
5 2023-01-30 09:10:27 3
6 2023-01-30 09:12:36 4
7 2023-01-30 09:14:13 5
8 2023-01-30 09:14:15 5
9 2023-01-30 09:16:12 6
10 2023-01-30 09:16:12 6
11 2023-01-30 09:16:13 6