Calculate difference of rows in Pandas

Question:

I have a timeseries dataframe where there are alerts for some particular rows. The dataframe looks like-

machineID    time    vibration    alerts
    1     2023-02-15    220          1
           11:45  
    1     2023-02-15    221          0
           12:00
    1     2023-02-15    219          0
           12:15
    1     2023-02-15    220          1
           12:30    
    1     2023-02-16    220          1
           11:45  
    1     2023-02-16    221          1
           12:00
    1     2023-02-16    219          0
           12:15
    1     2023-02-16    220          1
           12:30    

I want to calculate difference of alerts columns for each day. But since the date column is in time interval of 15 minutes, I am not getting how to group for whole day i.e., sum the alerts for each day and compare it with the sum of all alerts of the previous day.

In short, I need a way to sum all alerts for each day and substract with previous day. The result should be in another dataframe where there is a date column and difference of alerts column. In this case, the new dataframe will be-

time     diff_alerts
2023-02-16    1

since there is difference of 1 alert on the next day i.e. 16-02-2023

Asked By: PeakyBlinder

||

Answers:

Group by day with a custom pd.Grouper then sum alerts and finally compute the diff with the previous day:

>>> (df.groupby(pd.Grouper(key='time', freq='D'))['alerts'].sum().diff()
       .dropna().rename('diff_alerts').astype(int).reset_index())

        time  diff_alerts
0 2023-02-16            1

Note: the second line of code is just here to have a clean output.

Answered By: Corralien