Multi date timestamps binned in 15 minute intervals in Pandas

Question:

I have a Pandas Dataframe with datetimes across multiple months:

                 datetime  count
40722 2022-07-21 13:30:00      1
40809 2022-07-21 20:30:00      1
40822 2022-07-21 21:30:00      1
40948 2022-07-22 07:42:15      1
40954 2022-07-22 08:05:17      1
...                   ...    ...
40431 2023-03-02 19:59:13      1
40463 2023-03-02 22:21:47      1

I want to take these times and bin them into 15 minute intervals, ignoring the date, similar to

In [1]: 
dr = pd.date_range('00:00', '23:45', freq='15T')

Out [1]: 
DatetimeIndex(['2023-03-04 00:00:00', '2023-03-04 00:15:00',
               '2023-03-04 00:30:00', '2023-03-04 00:45:00',
               '2023-03-04 01:00:00', '2023-03-04 01:15:00',
               '2023-03-04 01:30:00', '2023-03-04 01:45:00',
               ...
              ])

but again, ignoring the date. I only want the %HH:%MM:%SS part considered. It is also important that the bins are set on the intervals shown (starting at 00:00:00, 00:15:00, ...)

I’ve tried a few different methods, trying to group the data into these bins like aggregate time series dataframe by 15 minute intervals and more comprehensively like How to make a histogram of pandas datetimes per specific time interval?.

However, trying to group the data fails:

df.groupby(df.datetime.dt.time.floor('15T').time).sum()
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'floor'

As do my attempts to cut the data by time:

df['time'] = pd.to_datetime(df["datetime"]).dt.time
df.groupby(pd.cut(df.time, pd.date_range('00:00:00', '23:45:00', freq='15T')))

TypeError: '<' not supported between instances of 'int' and 'datetime.time'

How can I group this data into the 15 minute bins like specified, ignoring the date?

Side note, I also seem to have an extra column, maybe an index, on my df. I imported a CSV with df = pd.read_csv('data/export.csv', index_col=False), but it remains.

Asked By: KJ0797

||

Answers:

You have to floor the datetime before extract the time part:

# df['datetime'] = pd.to_datetime(df['datetime'])
>>> df.groupby(df['datetime'].dt.floor('15T').dt.time)['count'].sum()
datetime
07:30:00    1
08:00:00    1
13:30:00    1
19:45:00    1
20:30:00    1
21:30:00    1
22:15:00    1
Name: count, dtype: int64

Update

I still need to find a way to add rows for the empty buckets

df['datetime'] = pd.to_datetime(df['datetime'])
dti = pd.date_range('1970-1-1', periods=4*24, freq='15T').time
out = (df.groupby(df['datetime'].dt.floor('15T').dt.time)['count'].sum()
         .reindex(dti, fill_value=0))
print(out)

# Output
datetime
00:00:00    0
00:15:00    0
00:30:00    0
00:45:00    0
01:00:00    0
           ..
22:45:00    0
23:00:00    0
23:15:00    0
23:30:00    0
23:45:00    0
Name: count, Length: 96, dtype: int64
Answered By: Corralien
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.