[Python], How to calculate 'service response time' per conversation id

Question:

I would like to calculate the average ‘service response time’ per conversation id as a variable in a dataframe (in minutes).

The ‘service response time’ is calculated by the difference between the ‘created_at’ variable from Y and X in minutes:

X = the first row where owner_type == "User" and is_interaction == 1.

Y = the first row after X where owner_type == "Agent" and owner_id != 1.

Update:

id owner_type owner_id conversation_id message created_at is_interaction
260943 Agent 1 26276 a 01/03/2022 15:00
265544 Agent 1 26276 b 05/03/2022 12:01
266749 User 153263 26276 c 05/03/2022 15:49 1
266750 User 153263 26276 d 05/03/2022 15:49 1
266753 Agent 14 26276 e 05/03/2022 15:51
267003 Agent 1 26276 f 06/03/2022 12:01
268900 User 153263 26276 g 06/03/2022 17:01 1
268904 Agent 1 26276 h 07/03/2022 12:00
271141 Agent 1 26276 i 09/03/2022 12:00
271725 User 153263 26276 j 09/03/2022 13:01 1
271728 User 153263 26276 k 09/03/2022 13:01 1
271727 Agent 10 26277 l 09/03/2022 13:01
272085 Agent 1 26276 m 10/03/2022 12:01

Any ideas on how to calculate this?

Update:

The resulted output should look like this:

enter image description here

You should replace the column name "Average Response Time (in minutes)" for "srt" in the dataframe. Ignore the "Average" in this column name, because it’s not and the "Date" column if not needed.

Best regards,

Milan Passchier

Asked By: Milan Passchier

||

Answers:

Update 04.11.2022

If you had a unique ID for each event, then it would be easier. And more: ‘X = the first row where owner_type == "User" and is_interaction == 1.’ This is not the first row at all, but the last one before ‘Y = the first row after X where owner_type == "Agent" and owner_id != 1’.

I offer two options. In both cases, the created_at column is converted to the desired format using pd.to_datetime, and a ‘srt’ column is created with empty values. Explicit loc indexing is used.

In the first one, the main logic is in list comprehensions (they are many times faster than a loop).

More:

First, a list bbb is created, in which the condition is checked at each iteration:

if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1

if it is met, then the iteration number is written and the my_func function is called, which is fed the iteration number and ‘conversation_id’. The function takes a dataframe by slice starting from i to the last one. Finds a row with ‘Agent’ that does not equal 1 and has the same ‘conversation_id’. The first available line is taken:

m = aaa.index[0]

If there are no such strings, then the function returns -1.

Thus, we get the list bbb, in which the User indexes are on the left and Agent on the right.

In the fff list, the last lines where the Agent index stops repeating are copied.

Further in the loop, with the help of the selected indices, the necessary lines are filled using loc.

code list comprehensions:

import numpy as np
import pandas as pd


df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan

def my_func(i, id):
    m = -1
    aaa = df[i:]
    aaa = aaa[(df.loc[i:, 'conversation_id'] == id) & (df.loc[i:, 'owner_type'] == 'Agent')
              & (df.loc[i:, 'owner_id'] != 1)]

    if len(aaa) > 0:
        m = aaa.index[0]

    return m


bbb = np.array([[i, my_func(i, df.loc[i, 'conversation_id'])]
                for i in range(len(df)) if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1])
fff = [bbb[i] for i in range(len(bbb) - 1) if (bbb[i, 1] != bbb[i + 1:, 1]).all() == True and bbb[i, 1] != -1]

if len(bbb) > 1 and bbb[-1, 1] != -1:
    fff.append(bbb[-1])

fff = np.array(fff)

for i in fff:
    df.loc[i[0], 'srt'] = (df.loc[i[1], 'created_at'] - df.loc[i[0], 'created_at']) / np.timedelta64(1, 'm')

print(df)

The solution is where all the logic is in a loop.

import numpy as np
import pandas as pd
    
df['created_at'] = pd.to_datetime(df['created_at'], errors='raise')
df['srt'] = np.nan


ferst_time_user = 0
cid = 0
ind = 0

for i in range(len(df)):
    if df.loc[i, 'owner_type'] == 'User' and df.loc[i, 'is_interaction'] == 1:
        ferst_time_user = df.loc[i, 'created_at']
        cid = df.loc[i, 'conversation_id']
        ind = i

    if ferst_time_user != 0 and df.loc[i, 'conversation_id'] == cid and df.loc[i, 'owner_type'] == 'Agent' and df.loc[i, 'owner_id'] != 1:
        df.loc[ind, 'srt'] = (df.loc[i, 'created_at'] - ferst_time_user) / np.timedelta64(1, 'm')
        ferst_time_user = 0
        ind = 0
        cid = 0

print(df)

Output

        id owner_type  owner_id  conversation_id message          created_at  
0   260943      Agent         1            26276       a 2022-01-03 15:00:00   
1   265544      Agent         1            26276       b 2022-05-03 12:01:00   
2   266749       User    153263            26276       c 2022-05-03 15:49:00   
3   266750       User    153263            26276       d 2022-05-03 15:49:00   
4   266753      Agent        14            26276       e 2022-05-03 15:51:00   
5   267003      Agent         1            26276       f 2022-06-03 12:01:00   
6   268900       User    153263            26276       g 2022-06-03 17:01:00   
7   268904      Agent         1            26276       h 2022-07-03 12:00:00   
8   271141      Agent         1            26276       i 2022-09-03 12:00:00   
9   271725       User    153263            26276       j 2022-09-03 13:01:00   
10  271728       User    153263            26276       k 2022-09-03 13:01:00   
11  271727      Agent        10            26277       l 2022-09-03 13:01:00   
12  272085      Agent         1            26276       m 2022-10-03 12:01:00   

    is_interaction  srt  
0              NaN  NaN  
1              NaN  NaN  
2              1.0  NaN  
3              1.0  2.0  
4              NaN  NaN  
5              NaN  NaN  
6              1.0  NaN  
7              NaN  NaN  
8              NaN  NaN  
9              1.0  NaN  
10             1.0  NaN  
11             NaN  NaN  
12             NaN  NaN  
Answered By: inquirer
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.