[Python], How to calculate 'service response time' per conversation id
Question:
I would like to calculate the average ‘service response time’ per conversation id as a variable in a dataframe (in minutes).
The ‘service response time’ is calculated by the difference between the ‘created_at’ variable from Y and X in minutes:
X = the first row where owner_type == "User" and is_interaction == 1.
Y = the first row after X where owner_type == "Agent" and owner_id != 1.
Update:
id
owner_type
owner_id
conversation_id
message
created_at
is_interaction
260943
Agent
1
26276
a
01/03/2022 15:00
265544
Agent
1
26276
b
05/03/2022 12:01
266749
User
153263
26276
c
05/03/2022 15:49
1
266750
User
153263
26276
d
05/03/2022 15:49
1
266753
Agent
14
26276
e
05/03/2022 15:51
267003
Agent
1
26276
f
06/03/2022 12:01
268900
User
153263
26276
g
06/03/2022 17:01
1
268904
Agent
1
26276
h
07/03/2022 12:00
271141
Agent
1
26276
i
09/03/2022 12:00
271725
User
153263
26276
j
09/03/2022 13:01
1
271728
User
153263
26276
k
09/03/2022 13:01
1
271727
Agent
10
26277
l
09/03/2022 13:01
272085
Agent
1
26276
m
10/03/2022 12:01
Any ideas on how to calculate this?
Update:
The resulted output should look like this:
You should replace the column name "Average Response Time (in minutes)" for "srt" in the dataframe. Ignore the "Average" in this column name, because it’s not and the "Date" column if not needed.
Best regards,
Milan Passchier
Answers:
Update 04.11.2022
If you had a unique ID for each event, then it would be easier. And more: ‘X = the first row where owner_type == "User" and is_interaction == 1.’ This is not the first row at all, but the last one before ‘Y = the first row after X where owner_type == "Agent" and owner_id != 1’.
I offer two options. In both cases, the created_at column is converted to the desired format using pd.to_datetime, and a ‘srt’ column is created with empty values. Explicit loc indexing is used.
In the first one, the main logic is in list comprehensions (they are many times faster than a loop).
More:
First, a list bbb is created, in which the condition is checked at each iteration:
if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1
if it is met, then the iteration number is written and the my_func function is called, which is fed the iteration number and ‘conversation_id’. The function takes a dataframe by slice starting from i to the last one. Finds a row with ‘Agent’ that does not equal 1 and has the same ‘conversation_id’. The first available line is taken:
m = aaa.index[0]
If there are no such strings, then the function returns -1.
Thus, we get the list bbb, in which the User indexes are on the left and Agent on the right.
In the fff list, the last lines where the Agent index stops repeating are copied.
Further in the loop, with the help of the selected indices, the necessary lines are filled using loc.
code list comprehensions:
import numpy as np
import pandas as pd
df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan
def my_func(i, id):
m = -1
aaa = df[i:]
aaa = aaa[(df.loc[i:, 'conversation_id'] == id) & (df.loc[i:, 'owner_type'] == 'Agent')
& (df.loc[i:, 'owner_id'] != 1)]
if len(aaa) > 0:
m = aaa.index[0]
return m
bbb = np.array([[i, my_func(i, df.loc[i, 'conversation_id'])]
for i in range(len(df)) if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1])
fff = [bbb[i] for i in range(len(bbb) - 1) if (bbb[i, 1] != bbb[i + 1:, 1]).all() == True and bbb[i, 1] != -1]
if len(bbb) > 1 and bbb[-1, 1] != -1:
fff.append(bbb[-1])
fff = np.array(fff)
for i in fff:
df.loc[i[0], 'srt'] = (df.loc[i[1], 'created_at'] - df.loc[i[0], 'created_at']) / np.timedelta64(1, 'm')
print(df)
The solution is where all the logic is in a loop.
import numpy as np
import pandas as pd
df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan
ferst_time_user = 0
cid = 0
ind = 0
for i in range(len(df)):
if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1:
ferst_time_user = df.loc[i, 'created_at']
cid = df.loc[i, 'conversation_id']
ind = i
if ferst_time_user != 0 and df.loc[i, 'conversation_id'] == cid and df.loc[i, 'owner_type'] == 'Agent' and df.loc[i, 'owner_id'] != 1:
df.loc[ind, 'srt'] = (df.loc[i, 'created_at'] - ferst_time_user) / np.timedelta64(1, 'm')
ferst_time_user = 0
ind = 0
cid = 0
print(df)
Output
id owner_type owner_id conversation_id message created_at
0 260943 Agent 1 26276 a 2022-01-03 15:00:00
1 265544 Agent 1 26276 b 2022-05-03 12:01:00
2 266749 User 153263 26276 c 2022-05-03 15:49:00
3 266750 User 153263 26276 d 2022-05-03 15:49:00
4 266753 Agent 14 26276 e 2022-05-03 15:51:00
5 267003 Agent 1 26276 f 2022-06-03 12:01:00
6 268900 User 153263 26276 g 2022-06-03 17:01:00
7 268904 Agent 1 26276 h 2022-07-03 12:00:00
8 271141 Agent 1 26276 i 2022-09-03 12:00:00
9 271725 User 153263 26276 j 2022-09-03 13:01:00
10 271728 User 153263 26276 k 2022-09-03 13:01:00
11 271727 Agent 10 26277 l 2022-09-03 13:01:00
12 272085 Agent 1 26276 m 2022-10-03 12:01:00
is_interaction srt
0 NaN NaN
1 NaN NaN
2 1.0 NaN
3 1.0 2.0
4 NaN NaN
5 NaN NaN
6 1.0 NaN
7 NaN NaN
8 NaN NaN
9 1.0 NaN
10 1.0 NaN
11 NaN NaN
12 NaN NaN
I would like to calculate the average ‘service response time’ per conversation id as a variable in a dataframe (in minutes).
The ‘service response time’ is calculated by the difference between the ‘created_at’ variable from Y and X in minutes:
X = the first row where owner_type == "User" and is_interaction == 1.
Y = the first row after X where owner_type == "Agent" and owner_id != 1.
Update:
id | owner_type | owner_id | conversation_id | message | created_at | is_interaction |
---|---|---|---|---|---|---|
260943 | Agent | 1 | 26276 | a | 01/03/2022 15:00 | |
265544 | Agent | 1 | 26276 | b | 05/03/2022 12:01 | |
266749 | User | 153263 | 26276 | c | 05/03/2022 15:49 | 1 |
266750 | User | 153263 | 26276 | d | 05/03/2022 15:49 | 1 |
266753 | Agent | 14 | 26276 | e | 05/03/2022 15:51 | |
267003 | Agent | 1 | 26276 | f | 06/03/2022 12:01 | |
268900 | User | 153263 | 26276 | g | 06/03/2022 17:01 | 1 |
268904 | Agent | 1 | 26276 | h | 07/03/2022 12:00 | |
271141 | Agent | 1 | 26276 | i | 09/03/2022 12:00 | |
271725 | User | 153263 | 26276 | j | 09/03/2022 13:01 | 1 |
271728 | User | 153263 | 26276 | k | 09/03/2022 13:01 | 1 |
271727 | Agent | 10 | 26277 | l | 09/03/2022 13:01 | |
272085 | Agent | 1 | 26276 | m | 10/03/2022 12:01 |
Any ideas on how to calculate this?
Update:
The resulted output should look like this:
You should replace the column name "Average Response Time (in minutes)" for "srt" in the dataframe. Ignore the "Average" in this column name, because it’s not and the "Date" column if not needed.
Best regards,
Milan Passchier
Update 04.11.2022
If you had a unique ID for each event, then it would be easier. And more: ‘X = the first row where owner_type == "User" and is_interaction == 1.’ This is not the first row at all, but the last one before ‘Y = the first row after X where owner_type == "Agent" and owner_id != 1’.
I offer two options. In both cases, the created_at column is converted to the desired format using pd.to_datetime, and a ‘srt’ column is created with empty values. Explicit loc indexing is used.
In the first one, the main logic is in list comprehensions (they are many times faster than a loop).
More:
First, a list bbb is created, in which the condition is checked at each iteration:
if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1
if it is met, then the iteration number is written and the my_func function is called, which is fed the iteration number and ‘conversation_id’. The function takes a dataframe by slice starting from i to the last one. Finds a row with ‘Agent’ that does not equal 1 and has the same ‘conversation_id’. The first available line is taken:
m = aaa.index[0]
If there are no such strings, then the function returns -1.
Thus, we get the list bbb, in which the User indexes are on the left and Agent on the right.
In the fff list, the last lines where the Agent index stops repeating are copied.
Further in the loop, with the help of the selected indices, the necessary lines are filled using loc.
code list comprehensions:
import numpy as np
import pandas as pd
df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan
def my_func(i, id):
m = -1
aaa = df[i:]
aaa = aaa[(df.loc[i:, 'conversation_id'] == id) & (df.loc[i:, 'owner_type'] == 'Agent')
& (df.loc[i:, 'owner_id'] != 1)]
if len(aaa) > 0:
m = aaa.index[0]
return m
bbb = np.array([[i, my_func(i, df.loc[i, 'conversation_id'])]
for i in range(len(df)) if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1])
fff = [bbb[i] for i in range(len(bbb) - 1) if (bbb[i, 1] != bbb[i + 1:, 1]).all() == True and bbb[i, 1] != -1]
if len(bbb) > 1 and bbb[-1, 1] != -1:
fff.append(bbb[-1])
fff = np.array(fff)
for i in fff:
df.loc[i[0], 'srt'] = (df.loc[i[1], 'created_at'] - df.loc[i[0], 'created_at']) / np.timedelta64(1, 'm')
print(df)
The solution is where all the logic is in a loop.
import numpy as np
import pandas as pd
df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan
ferst_time_user = 0
cid = 0
ind = 0
for i in range(len(df)):
if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1:
ferst_time_user = df.loc[i, 'created_at']
cid = df.loc[i, 'conversation_id']
ind = i
if ferst_time_user != 0 and df.loc[i, 'conversation_id'] == cid and df.loc[i, 'owner_type'] == 'Agent' and df.loc[i, 'owner_id'] != 1:
df.loc[ind, 'srt'] = (df.loc[i, 'created_at'] - ferst_time_user) / np.timedelta64(1, 'm')
ferst_time_user = 0
ind = 0
cid = 0
print(df)
Output
id owner_type owner_id conversation_id message created_at
0 260943 Agent 1 26276 a 2022-01-03 15:00:00
1 265544 Agent 1 26276 b 2022-05-03 12:01:00
2 266749 User 153263 26276 c 2022-05-03 15:49:00
3 266750 User 153263 26276 d 2022-05-03 15:49:00
4 266753 Agent 14 26276 e 2022-05-03 15:51:00
5 267003 Agent 1 26276 f 2022-06-03 12:01:00
6 268900 User 153263 26276 g 2022-06-03 17:01:00
7 268904 Agent 1 26276 h 2022-07-03 12:00:00
8 271141 Agent 1 26276 i 2022-09-03 12:00:00
9 271725 User 153263 26276 j 2022-09-03 13:01:00
10 271728 User 153263 26276 k 2022-09-03 13:01:00
11 271727 Agent 10 26277 l 2022-09-03 13:01:00
12 272085 Agent 1 26276 m 2022-10-03 12:01:00
is_interaction srt
0 NaN NaN
1 NaN NaN
2 1.0 NaN
3 1.0 2.0
4 NaN NaN
5 NaN NaN
6 1.0 NaN
7 NaN NaN
8 NaN NaN
9 1.0 NaN
10 1.0 NaN
11 NaN NaN
12 NaN NaN