Aggregate time series data on weekly basis

Question:

I have a dataframe that consists of 3 years of data and two columns remaining useful life and predicted remaining useful life.

I am aggregating rul and pred_rul of 3 years data for each machineID for the maximum date they have. The original dataframe looks like this-

              rul   pred_diff   machineID   datetime
10476749    870    312.207825   408      2021-05-25 00:00:00
11452943    68     288.517578   447      2023-03-01 12:00:00
12693829    381    273.159698   493      2021-09-16 16:00:00
3413787     331    291.326416   133      2022-10-26 12:00:00
464093       77    341.506195   19       2023-10-10 16:00:00
... ... ... ... ...
11677555    537    310.586090   456      2022-04-07 00:00:00
2334804     551    289.307129   92       2021-09-04 20:00:00
5508311     35     293.721771   214      2023-01-06 04:00:00
12319704    348    322.199219   479      2021-11-11 20:00:00
4777501      87    278.089417   186      2021-06-29 12:00:00
1287421 rows × 4 columns

And I am aggregating it based on this code-

y_test_grp = y_test.groupby('machineID').agg({'datetime':'max', 'rul':'mean', 'pred_diff':'mean'})[['datetime','rul', 'pred_diff']].reset_index()

which gives the following output-

    machineID   datetime    rul         pred_diff
0   1   2023-10-03 20:00:00 286.817681  266.419401
1   2   2023-11-14 00:00:00 225.561953  263.372531
2   3   2023-10-25 00:00:00 304.736237  256.933351
3   4   2023-01-13 12:00:00 204.084899  252.476066
4   5   2023-09-07 00:00:00 208.702431  252.487156
... ... ... ... ...
495 496 2023-10-11 00:00:00 302.445285  298.836798
496 497 2023-08-26 04:00:00 281.601613  263.479885
497 498 2023-11-28 04:00:00 292.593906  263.985034
498 499 2023-06-29 20:00:00 260.887529  263.494844
499 500 2023-11-08 20:00:00 160.223614  257.326034
500 rows × 4 columns

Since this is grouped by on machineID, it is giving just 500 rows which is less. I want to aggregate rul and pred_rul on weekly basis such that for each machineID I get 52weeks*3years=156 rows. I am not able to identify which function to use for taking 7 days as interval and aggregating rul and pred_rul on that.

Asked By: PeakyBlinder

||

Answers:

You can use Grouper:

pd.groupby(['machineID', pd.Grouper(key='datetime', freq='7D')]).mean()
Answered By: Quang Hoang