PANDAS group by with 30 minute intervals and calculate total difference

Question

I have a data frame that looks like this:

date	week	id
20/07/21 12:46:00	1	d1
20/07/21 12:56:00	1	d1
20/07/21 13:09:00	1	d1
20/07/21 14:11:00	1	d1
20/07/21 14:42:00	1	d1

I want to group by date in in 30 minutes interval- so if 2 consecutive rows are more than 30 minutes apart they are on different groups.
The output I need looks like this:

week	id	min_date	max_date
1	d1	20/07/21 12:46:00	20/07/21 13:09:00
1	d1	20/07/21 14:11:00	20/07/21 14:11:00
1	d1	20/07/21 14:42:00	20/07/21 14:42:00

I used this code in order to group by:

x=df.groupby(['id','week', pd.Grouper(key='date', freq='30min',origin="start")]).agg({'date':[np.min, np.max]})

Something isn’t working with the grouper, any suggestions how to improve it?

EDIT:

Here’s an example of my data that causes an issue:

date	week	id
20/07/21 12:46:00	1	d1
20/07/21 12:56:00	1	d1
20/07/21 13:09:00	1	d1
22/07/21 07:11:00	1	d1
22/07/21 07:14:00	1	d1
22/07/21 07:27:00	1	d1
22/07/21 08:34:00	1	d1
22/07/21 08:36:00	1	d1

The output required is:

week	id	min_date	max_date
1	d1	20/07/21 12:46:00	20/07/21 13:09:00
1	d1	20/07/21 07:11:00	20/07/21 07:27:00
1	d1	20/07/21 08:34:00	20/07/21 08:36:00

This is the output I get:

week	id	min_date	max_date
1	d1	20/07/21 12:46:00	20/07/21 13:09:00
1	d1	20/07/21 07:11:00	20/07/21 08:36:00

I don’t understand why it groups the last rows together when there is more than an hour difference between 20/07/21 07:27:00 and 20/07/21 08:34:00.

Thanks!

Asked By: kri

||

Source

Answer 1

You can use:

df['date'] = pd.to_datetime(df['date'])

(df.groupby(df['date'].diff().gt(pd.Timedelta('30min')).cumsum())
 ['date'].agg(['min', 'max'])
)

Or maybe also group by id and week:

df['date'] = pd.to_datetime(df['date'])

(df.groupby(['week', 'id', df['date'].diff().gt(pd.Timedelta('30min')).cumsum()])
 ['date'].agg(['min', 'max'])
  .droplevel(-1).reset_index()
)

Output:

   week  id                 min                 max
0     1  d1 2021-07-20 12:46:00 2021-07-20 13:09:00
1     1  d1 2021-07-20 14:11:00 2021-07-20 14:11:00
2     1  d1 2021-07-20 14:42:00 2021-07-20 14:42:00

Answered By: mozway

PANDAS group by with 30 minute intervals and calculate total difference

Question:

Answers: