Create new dataframe column with numbered time windows

Question:

I am having a really hard time when trying to make a new pandas column in a dataframe that would represent numbered time windows.
I am not trying to do grouping/aggregation.

Consider I have following df:

TIME
0   2018-01-02 06:00:00
1   2018-01-02 06:01:56
2   2018-01-02 06:05:55
3   2018-01-02 06:06:08
4   2018-01-02 06:06:22
5   2018-01-02 06:07:16
6   2018-01-02 06:07:57
7   2018-01-02 06:08:42
8   2018-01-02 06:10:44
9   2018-01-02 06:10:24
10  2018-01-02 06:10:46

I need to get the following for a 5 minute window:

TIME                     WINDOW_NUMBER
0   2018-01-02 06:00:00  1
1   2018-01-02 06:01:56  1
2   2018-01-02 06:05:55  2
3   2018-01-02 06:06:08  2
4   2018-01-02 06:06:22  2
5   2018-01-02 06:07:16  2
6   2018-01-02 06:07:57  2
7   2018-01-02 06:08:42  2
8   2018-01-02 06:10:44  3
9   2018-01-02 06:10:24  3
10  2018-01-02 06:10:46  3

I need the windowing parameter to be adjustable in minutes.

Any help is much appreciated.

Asked By: Dominik Novotný

||

Answers:

You could make use of timedelta to specify the time interval. And subtract this time difference from the time column.

I have used start = df['TIME'][0] to specify the reference start time 2018-01-02 06:00:00 which is the first value in the column.

import pandas as pd
from datetime import timedelta

df = pd.DataFrame({'TIME': [
'2018-01-02 06:00:00',
'2018-01-02 06:01:56',
'2018-01-02 06:05:55',
'2018-01-02 06:06:08',
'2018-01-02 06:06:22',
'2018-01-02 06:27:16',
'2018-01-02 06:27:57',
'2018-01-02 06:38:42',
'2018-01-02 06:40:44',
'2018-01-02 06:40:24',
'2018-01-02 06:58:46'
]})
df['TIME'] = pd.to_datetime(df['TIME'])

diff = timedelta(minutes=5)
start = df['TIME'][0]
df['W'] = (df['TIME'] - start) // diff +1

#to meet your requirement of incremental window by 1
df['WINDOW_NUMBER'] = 1
for i in range(1, df.shape[0]):
    if df.loc[i, 'W'] != df.loc[i-1, 'W']:
        df.loc[i, 'WINDOW_NUMBER'] = df.loc[i-1, 'WINDOW_NUMBER'] + 1
    else:
        df.loc[i, 'WINDOW_NUMBER'] = df.loc[i-1, 'WINDOW_NUMBER']

Output

    TIME                W   WINDOW_NUMBER
0   2018-01-02 06:00:00 1   1
1   2018-01-02 06:01:56 1   1
2   2018-01-02 06:05:55 2   2
3   2018-01-02 06:06:08 2   2
4   2018-01-02 06:06:22 2   2
5   2018-01-02 06:27:16 6   3
6   2018-01-02 06:27:57 6   3
7   2018-01-02 06:38:42 8   4
8   2018-01-02 06:40:44 9   5
9   2018-01-02 06:40:24 9   5
10  2018-01-02 06:58:46 12  6
Answered By: perpetualstudent

You can also try:

df = (
    df.assign(
        TIME=pd.to_datetime(df.TIME))
    .set_index('TIME')
)
df = df.assign(
    WINDOW_NUMBER=df.groupby(
        pd.Grouper(freq='5Min'))
    .ngroup() + 1
)
Answered By: Nk03
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.