Use Python to calculate difference between date in current row and date in previous row and previous column

Question:

I am trying to calculate the difference between dates in different rows that are not in the same column. I have the following dataset:

Kiosk Online Offline Online Days
1 12/1/2022 12/5/2022 4
1 12/7/2022 12/17/2022 10
1 12/20/2022 12/21/2022 1
1 12/24/2022 12/29/2022 5
2 11/15/2022 11/30/2022 15
2 12/2/2022 12/7/2022 5
2 12/15/2022 12/25/2022 10
3 10/30/2022 11/15/2022 16
3 11/17/2022 11/22/2022 5
3 11/23/2022 11/30/2022 7
3 12/4/2022 12/15/2022 11
3 12/18/2022 12/20/2022 2

I know I can use the diff function to calculate the difference between different rows in the same column, but I have not found a function that allows me to offset the row AND columns. In my example, I need to calculate the Offline Days as the difference between the Online date in my dataset and the Offline date in the previous row for the same Kiosk.

This is the output that I need in my output dataframe:

Kiosk Online Offline Online Days Offline Days
1 12/1/2022 12/5/2022 4 Nan
1 12/7/2022 12/17/2022 10 2
1 12/20/2022 12/21/2022 1 3
1 12/24/2022 12/29/2022 5 3
2 11/15/2022 11/30/2022 15 Nan
2 12/2/2022 12/7/2022 5 2
2 12/15/2022 12/25/2022 10 8
3 10/30/2022 11/15/2022 16 Nan
3 11/17/2022 11/22/2022 5 2
3 11/23/2022 11/30/2022 7 1
3 12/4/2022 12/15/2022 11 4
3 12/18/2022 12/20/2022 2 3

Dataframe to start with :

df=pd.DataFrame({'Kiosk':[1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],
                 'Online':["12/1/2022", "12/7/2022","12/20/2022", "12/24/2022","11/15/2022", 
                           "12/2/2022","12/15/2022", "10/30/2022","11/17/2022", "11/23/2022",
                           "12/4/2022", "12/18/2022"],
                 'Offline':["12/5/2022", "12/17/2022","12/21/2022", "12/29/2022","11/30/2022", 
                            "12/7/2022","12/25/2022", "11/15/2022","11/22/2022", "11/30/2022",
                            "12/15/2022", "12/20/2022"],
                 'Online Days':[4, 10, 1, 5, 15, 5, 10, 16, 5, 7, 11, 2],
                 })
Asked By: the_zeef

||

Answers:

I took into account Mozway comment, so I propose you instead the following code (I didn’t see you want it by kiosk) :

df['Online'] = pd.to_datetime(df['Online'], infer_datetime_format=True)
df['Offline'] = pd.to_datetime(df['Offline'], infer_datetime_format=True)

df['Offsets'] = (df['Online'] - df.groupby('Kiosk')['Offline'].shift(1)).dt.days
print(df)

Result

    Kiosk     Online    Offline  Online Days Offsets
0       1 2022-12-01 2022-12-05            4     NaT
1       1 2022-12-07 2022-12-17           10  2 days
2       1 2022-12-20 2022-12-21            1  3 days
3       1 2022-12-24 2022-12-29            5  3 days
4       2 2022-11-15 2022-11-30           15     NaT
5       2 2022-12-02 2022-12-07            5  2 days
6       2 2022-12-15 2022-12-25           10  8 days
7       3 2022-10-30 2022-11-15           16     NaT
8       3 2022-11-17 2022-11-22            5  2 days
9       3 2022-11-23 2022-11-30            7  1 days
10      3 2022-12-04 2022-12-15           11  4 days
11      3 2022-12-18 2022-12-20            2  3 days
Answered By: Laurent B.

Use .shift align Offline dates and subtract from Online dates and then use .where to empty on when the Kiosk changes [or to remove negative differences].

# df = pd.read_html('https://stackoverflow.com/questions/75962581')[0]
# df[['Online','Offline']] = df[['Online','Offline']].applymap(pd.to_datetime)

offline_days = df['Online'] - df['Offline'].shift(1)
df['Offline Days'] = offline_days.where(df['Kiosk']==df['Kiosk'].shift(1))
# df['Offline Days'] = offline_days.where(lambda td: td>pd.Timedelta(0))

opdf

Answered By: Driftr95
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.