Create new dataframe column with numbered time windows
Question:
I am having a really hard time when trying to make a new pandas column in a dataframe that would represent numbered time windows.
I am not trying to do grouping/aggregation.
Consider I have following df:
TIME
0 2018-01-02 06:00:00
1 2018-01-02 06:01:56
2 2018-01-02 06:05:55
3 2018-01-02 06:06:08
4 2018-01-02 06:06:22
5 2018-01-02 06:07:16
6 2018-01-02 06:07:57
7 2018-01-02 06:08:42
8 2018-01-02 06:10:44
9 2018-01-02 06:10:24
10 2018-01-02 06:10:46
I need to get the following for a 5 minute window:
TIME WINDOW_NUMBER
0 2018-01-02 06:00:00 1
1 2018-01-02 06:01:56 1
2 2018-01-02 06:05:55 2
3 2018-01-02 06:06:08 2
4 2018-01-02 06:06:22 2
5 2018-01-02 06:07:16 2
6 2018-01-02 06:07:57 2
7 2018-01-02 06:08:42 2
8 2018-01-02 06:10:44 3
9 2018-01-02 06:10:24 3
10 2018-01-02 06:10:46 3
I need the windowing parameter to be adjustable in minutes.
Any help is much appreciated.
Answers:
You could make use of timedelta
to specify the time interval. And subtract this time difference from the time column.
I have used start = df['TIME'][0]
to specify the reference start time 2018-01-02 06:00:00
which is the first value in the column.
import pandas as pd
from datetime import timedelta
df = pd.DataFrame({'TIME': [
'2018-01-02 06:00:00',
'2018-01-02 06:01:56',
'2018-01-02 06:05:55',
'2018-01-02 06:06:08',
'2018-01-02 06:06:22',
'2018-01-02 06:27:16',
'2018-01-02 06:27:57',
'2018-01-02 06:38:42',
'2018-01-02 06:40:44',
'2018-01-02 06:40:24',
'2018-01-02 06:58:46'
]})
df['TIME'] = pd.to_datetime(df['TIME'])
diff = timedelta(minutes=5)
start = df['TIME'][0]
df['W'] = (df['TIME'] - start) // diff +1
#to meet your requirement of incremental window by 1
df['WINDOW_NUMBER'] = 1
for i in range(1, df.shape[0]):
if df.loc[i, 'W'] != df.loc[i-1, 'W']:
df.loc[i, 'WINDOW_NUMBER'] = df.loc[i-1, 'WINDOW_NUMBER'] + 1
else:
df.loc[i, 'WINDOW_NUMBER'] = df.loc[i-1, 'WINDOW_NUMBER']
Output
TIME W WINDOW_NUMBER
0 2018-01-02 06:00:00 1 1
1 2018-01-02 06:01:56 1 1
2 2018-01-02 06:05:55 2 2
3 2018-01-02 06:06:08 2 2
4 2018-01-02 06:06:22 2 2
5 2018-01-02 06:27:16 6 3
6 2018-01-02 06:27:57 6 3
7 2018-01-02 06:38:42 8 4
8 2018-01-02 06:40:44 9 5
9 2018-01-02 06:40:24 9 5
10 2018-01-02 06:58:46 12 6
You can also try:
df = (
df.assign(
TIME=pd.to_datetime(df.TIME))
.set_index('TIME')
)
df = df.assign(
WINDOW_NUMBER=df.groupby(
pd.Grouper(freq='5Min'))
.ngroup() + 1
)
I am having a really hard time when trying to make a new pandas column in a dataframe that would represent numbered time windows.
I am not trying to do grouping/aggregation.
Consider I have following df:
TIME
0 2018-01-02 06:00:00
1 2018-01-02 06:01:56
2 2018-01-02 06:05:55
3 2018-01-02 06:06:08
4 2018-01-02 06:06:22
5 2018-01-02 06:07:16
6 2018-01-02 06:07:57
7 2018-01-02 06:08:42
8 2018-01-02 06:10:44
9 2018-01-02 06:10:24
10 2018-01-02 06:10:46
I need to get the following for a 5 minute window:
TIME WINDOW_NUMBER
0 2018-01-02 06:00:00 1
1 2018-01-02 06:01:56 1
2 2018-01-02 06:05:55 2
3 2018-01-02 06:06:08 2
4 2018-01-02 06:06:22 2
5 2018-01-02 06:07:16 2
6 2018-01-02 06:07:57 2
7 2018-01-02 06:08:42 2
8 2018-01-02 06:10:44 3
9 2018-01-02 06:10:24 3
10 2018-01-02 06:10:46 3
I need the windowing parameter to be adjustable in minutes.
Any help is much appreciated.
You could make use of timedelta
to specify the time interval. And subtract this time difference from the time column.
I have used start = df['TIME'][0]
to specify the reference start time 2018-01-02 06:00:00
which is the first value in the column.
import pandas as pd
from datetime import timedelta
df = pd.DataFrame({'TIME': [
'2018-01-02 06:00:00',
'2018-01-02 06:01:56',
'2018-01-02 06:05:55',
'2018-01-02 06:06:08',
'2018-01-02 06:06:22',
'2018-01-02 06:27:16',
'2018-01-02 06:27:57',
'2018-01-02 06:38:42',
'2018-01-02 06:40:44',
'2018-01-02 06:40:24',
'2018-01-02 06:58:46'
]})
df['TIME'] = pd.to_datetime(df['TIME'])
diff = timedelta(minutes=5)
start = df['TIME'][0]
df['W'] = (df['TIME'] - start) // diff +1
#to meet your requirement of incremental window by 1
df['WINDOW_NUMBER'] = 1
for i in range(1, df.shape[0]):
if df.loc[i, 'W'] != df.loc[i-1, 'W']:
df.loc[i, 'WINDOW_NUMBER'] = df.loc[i-1, 'WINDOW_NUMBER'] + 1
else:
df.loc[i, 'WINDOW_NUMBER'] = df.loc[i-1, 'WINDOW_NUMBER']
Output
TIME W WINDOW_NUMBER
0 2018-01-02 06:00:00 1 1
1 2018-01-02 06:01:56 1 1
2 2018-01-02 06:05:55 2 2
3 2018-01-02 06:06:08 2 2
4 2018-01-02 06:06:22 2 2
5 2018-01-02 06:27:16 6 3
6 2018-01-02 06:27:57 6 3
7 2018-01-02 06:38:42 8 4
8 2018-01-02 06:40:44 9 5
9 2018-01-02 06:40:24 9 5
10 2018-01-02 06:58:46 12 6
You can also try:
df = (
df.assign(
TIME=pd.to_datetime(df.TIME))
.set_index('TIME')
)
df = df.assign(
WINDOW_NUMBER=df.groupby(
pd.Grouper(freq='5Min'))
.ngroup() + 1
)