Use Python to calculate difference between date in current row and date in previous row and previous column
Question:
I am trying to calculate the difference between dates in different rows that are not in the same column. I have the following dataset:
Kiosk
Online
Offline
Online Days
1
12/1/2022
12/5/2022
4
1
12/7/2022
12/17/2022
10
1
12/20/2022
12/21/2022
1
1
12/24/2022
12/29/2022
5
2
11/15/2022
11/30/2022
15
2
12/2/2022
12/7/2022
5
2
12/15/2022
12/25/2022
10
3
10/30/2022
11/15/2022
16
3
11/17/2022
11/22/2022
5
3
11/23/2022
11/30/2022
7
3
12/4/2022
12/15/2022
11
3
12/18/2022
12/20/2022
2
I know I can use the diff function to calculate the difference between different rows in the same column, but I have not found a function that allows me to offset the row AND columns. In my example, I need to calculate the Offline Days as the difference between the Online date in my dataset and the Offline date in the previous row for the same Kiosk.
This is the output that I need in my output dataframe:
Kiosk
Online
Offline
Online Days
Offline Days
1
12/1/2022
12/5/2022
4
Nan
1
12/7/2022
12/17/2022
10
2
1
12/20/2022
12/21/2022
1
3
1
12/24/2022
12/29/2022
5
3
2
11/15/2022
11/30/2022
15
Nan
2
12/2/2022
12/7/2022
5
2
2
12/15/2022
12/25/2022
10
8
3
10/30/2022
11/15/2022
16
Nan
3
11/17/2022
11/22/2022
5
2
3
11/23/2022
11/30/2022
7
1
3
12/4/2022
12/15/2022
11
4
3
12/18/2022
12/20/2022
2
3
Dataframe to start with :
df=pd.DataFrame({'Kiosk':[1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],
'Online':["12/1/2022", "12/7/2022","12/20/2022", "12/24/2022","11/15/2022",
"12/2/2022","12/15/2022", "10/30/2022","11/17/2022", "11/23/2022",
"12/4/2022", "12/18/2022"],
'Offline':["12/5/2022", "12/17/2022","12/21/2022", "12/29/2022","11/30/2022",
"12/7/2022","12/25/2022", "11/15/2022","11/22/2022", "11/30/2022",
"12/15/2022", "12/20/2022"],
'Online Days':[4, 10, 1, 5, 15, 5, 10, 16, 5, 7, 11, 2],
})
Answers:
I took into account Mozway comment, so I propose you instead the following code (I didn’t see you want it by kiosk) :
df['Online'] = pd.to_datetime(df['Online'], infer_datetime_format=True)
df['Offline'] = pd.to_datetime(df['Offline'], infer_datetime_format=True)
df['Offsets'] = (df['Online'] - df.groupby('Kiosk')['Offline'].shift(1)).dt.days
print(df)
Result
Kiosk Online Offline Online Days Offsets
0 1 2022-12-01 2022-12-05 4 NaT
1 1 2022-12-07 2022-12-17 10 2 days
2 1 2022-12-20 2022-12-21 1 3 days
3 1 2022-12-24 2022-12-29 5 3 days
4 2 2022-11-15 2022-11-30 15 NaT
5 2 2022-12-02 2022-12-07 5 2 days
6 2 2022-12-15 2022-12-25 10 8 days
7 3 2022-10-30 2022-11-15 16 NaT
8 3 2022-11-17 2022-11-22 5 2 days
9 3 2022-11-23 2022-11-30 7 1 days
10 3 2022-12-04 2022-12-15 11 4 days
11 3 2022-12-18 2022-12-20 2 3 days
Use .shift
align Offline
dates and subtract from Online
dates and then use .where
to empty on when the Kiosk
changes [or to remove negative differences].
# df = pd.read_html('https://stackoverflow.com/questions/75962581')[0]
# df[['Online','Offline']] = df[['Online','Offline']].applymap(pd.to_datetime)
offline_days = df['Online'] - df['Offline'].shift(1)
df['Offline Days'] = offline_days.where(df['Kiosk']==df['Kiosk'].shift(1))
# df['Offline Days'] = offline_days.where(lambda td: td>pd.Timedelta(0))
I am trying to calculate the difference between dates in different rows that are not in the same column. I have the following dataset:
Kiosk | Online | Offline | Online Days |
---|---|---|---|
1 | 12/1/2022 | 12/5/2022 | 4 |
1 | 12/7/2022 | 12/17/2022 | 10 |
1 | 12/20/2022 | 12/21/2022 | 1 |
1 | 12/24/2022 | 12/29/2022 | 5 |
2 | 11/15/2022 | 11/30/2022 | 15 |
2 | 12/2/2022 | 12/7/2022 | 5 |
2 | 12/15/2022 | 12/25/2022 | 10 |
3 | 10/30/2022 | 11/15/2022 | 16 |
3 | 11/17/2022 | 11/22/2022 | 5 |
3 | 11/23/2022 | 11/30/2022 | 7 |
3 | 12/4/2022 | 12/15/2022 | 11 |
3 | 12/18/2022 | 12/20/2022 | 2 |
I know I can use the diff function to calculate the difference between different rows in the same column, but I have not found a function that allows me to offset the row AND columns. In my example, I need to calculate the Offline Days as the difference between the Online date in my dataset and the Offline date in the previous row for the same Kiosk.
This is the output that I need in my output dataframe:
Kiosk | Online | Offline | Online Days | Offline Days |
---|---|---|---|---|
1 | 12/1/2022 | 12/5/2022 | 4 | Nan |
1 | 12/7/2022 | 12/17/2022 | 10 | 2 |
1 | 12/20/2022 | 12/21/2022 | 1 | 3 |
1 | 12/24/2022 | 12/29/2022 | 5 | 3 |
2 | 11/15/2022 | 11/30/2022 | 15 | Nan |
2 | 12/2/2022 | 12/7/2022 | 5 | 2 |
2 | 12/15/2022 | 12/25/2022 | 10 | 8 |
3 | 10/30/2022 | 11/15/2022 | 16 | Nan |
3 | 11/17/2022 | 11/22/2022 | 5 | 2 |
3 | 11/23/2022 | 11/30/2022 | 7 | 1 |
3 | 12/4/2022 | 12/15/2022 | 11 | 4 |
3 | 12/18/2022 | 12/20/2022 | 2 | 3 |
Dataframe to start with :
df=pd.DataFrame({'Kiosk':[1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],
'Online':["12/1/2022", "12/7/2022","12/20/2022", "12/24/2022","11/15/2022",
"12/2/2022","12/15/2022", "10/30/2022","11/17/2022", "11/23/2022",
"12/4/2022", "12/18/2022"],
'Offline':["12/5/2022", "12/17/2022","12/21/2022", "12/29/2022","11/30/2022",
"12/7/2022","12/25/2022", "11/15/2022","11/22/2022", "11/30/2022",
"12/15/2022", "12/20/2022"],
'Online Days':[4, 10, 1, 5, 15, 5, 10, 16, 5, 7, 11, 2],
})
I took into account Mozway comment, so I propose you instead the following code (I didn’t see you want it by kiosk) :
df['Online'] = pd.to_datetime(df['Online'], infer_datetime_format=True)
df['Offline'] = pd.to_datetime(df['Offline'], infer_datetime_format=True)
df['Offsets'] = (df['Online'] - df.groupby('Kiosk')['Offline'].shift(1)).dt.days
print(df)
Result
Kiosk Online Offline Online Days Offsets
0 1 2022-12-01 2022-12-05 4 NaT
1 1 2022-12-07 2022-12-17 10 2 days
2 1 2022-12-20 2022-12-21 1 3 days
3 1 2022-12-24 2022-12-29 5 3 days
4 2 2022-11-15 2022-11-30 15 NaT
5 2 2022-12-02 2022-12-07 5 2 days
6 2 2022-12-15 2022-12-25 10 8 days
7 3 2022-10-30 2022-11-15 16 NaT
8 3 2022-11-17 2022-11-22 5 2 days
9 3 2022-11-23 2022-11-30 7 1 days
10 3 2022-12-04 2022-12-15 11 4 days
11 3 2022-12-18 2022-12-20 2 3 days
Use .shift
align Offline
dates and subtract from Online
dates and then use .where
to empty on when the Kiosk
changes [or to remove negative differences].
# df = pd.read_html('https://stackoverflow.com/questions/75962581')[0]
# df[['Online','Offline']] = df[['Online','Offline']].applymap(pd.to_datetime)
offline_days = df['Online'] - df['Offline'].shift(1)
df['Offline Days'] = offline_days.where(df['Kiosk']==df['Kiosk'].shift(1))
# df['Offline Days'] = offline_days.where(lambda td: td>pd.Timedelta(0))