PANDAS group by with 30 minute intervals and calculate total difference

Question:

I have a data frame that looks like this:

date week id
20/07/21 12:46:00 1 d1
20/07/21 12:56:00 1 d1
20/07/21 13:09:00 1 d1
20/07/21 14:11:00 1 d1
20/07/21 14:42:00 1 d1

I want to group by date in in 30 minutes interval- so if 2 consecutive rows are more than 30 minutes apart they are on different groups.
The output I need looks like this:

week id min_date max_date
1 d1 20/07/21 12:46:00 20/07/21 13:09:00
1 d1 20/07/21 14:11:00 20/07/21 14:11:00
1 d1 20/07/21 14:42:00 20/07/21 14:42:00

I used this code in order to group by:

x=df.groupby(['id','week', pd.Grouper(key='date', freq='30min',origin="start")]).agg({'date':[np.min, np.max]})

Something isn’t working with the grouper, any suggestions how to improve it?

EDIT:

Here’s an example of my data that causes an issue:

date week id
20/07/21 12:46:00 1 d1
20/07/21 12:56:00 1 d1
20/07/21 13:09:00 1 d1
22/07/21 07:11:00 1 d1
22/07/21 07:14:00 1 d1
22/07/21 07:27:00 1 d1
22/07/21 08:34:00 1 d1
22/07/21 08:36:00 1 d1

The output required is:

week id min_date max_date
1 d1 20/07/21 12:46:00 20/07/21 13:09:00
1 d1 20/07/21 07:11:00 20/07/21 07:27:00
1 d1 20/07/21 08:34:00 20/07/21 08:36:00

This is the output I get:

week id min_date max_date
1 d1 20/07/21 12:46:00 20/07/21 13:09:00
1 d1 20/07/21 07:11:00 20/07/21 08:36:00

I don’t understand why it groups the last rows together when there is more than an hour difference between 20/07/21 07:27:00 and 20/07/21 08:34:00.

Thanks!

Asked By: kri

||

Answers:

You can use:

df['date'] = pd.to_datetime(df['date'])

(df.groupby(df['date'].diff().gt(pd.Timedelta('30min')).cumsum())
 ['date'].agg(['min', 'max'])
)

Or maybe also group by id and week:

df['date'] = pd.to_datetime(df['date'])

(df.groupby(['week', 'id', df['date'].diff().gt(pd.Timedelta('30min')).cumsum()])
 ['date'].agg(['min', 'max'])
  .droplevel(-1).reset_index()
)

Output:

   week  id                 min                 max
0     1  d1 2021-07-20 12:46:00 2021-07-20 13:09:00
1     1  d1 2021-07-20 14:11:00 2021-07-20 14:11:00
2     1  d1 2021-07-20 14:42:00 2021-07-20 14:42:00
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.