Calculate time difference between each minimum time event of a category using pandas

Question:

I am working on a data in which i need to specifically calculate the time difference between specific events based on minimum time before a particular event. The sample data is below

Sample

In the above image I need to find the difference between each infusion status based on other infusion status. Fox example, first event is STOPPED and the second event is RUNNING. So the time difference between these two events needs to be stored in a new column against row having running status. Likewise 2nd row is RUNNING status and Third Row is Stopped Status. So time difference between these rows needs to be populated under 3rd row against stopped status. Now the difference between third row and sixth row has to go against sixth row and so on. I was able to find the row difference between each row as well as difference between minimum time and each row.

But unable to find a logic to calculate based on my requirement. If anybody can help me with this it would be great.

Below is the link for the data.

Data

Answers:

not sure if I understood correctly, but I think what you want could be achieved with shifting to get previous values, here goes:

df['event_time'] = pd.to_datetime(df['event_time'])
df['infusion_id'] = df['infusion_id'].astype(str) #floats are hard to compare
df['prev_status'] = df['infusion_status'].shift(1)
df['prev_id'] = df['infusion_id'].shift(1)
df['first_time'] = df.groupby(['infusion_id', 'infusion_status'])['event_time'].transform('min').shift(1).values

df.loc[(df['infusion_id']==df['prev_id'])&(df['infusion_status']!=df['prev_status']), 'change_delta'] = df['event_time'] - df['first_time']
Answered By: Ezer K

I was able to find the solution for my requirement with the help of few pieces of code provided by Ezer.

What I did was found the previous event and its time in two separate columns, did a comparison between previous rows to find whether the status has changed. then using the status change flag = true, i subset the entire data which has flag = true. This would give me only values where I had status change. Using this subset I found the difference between current row and previous row which gives me the time difference between each status.

Then merged this data frame with the original frame and whoooo, i got the data frame with all the vales I need.

Below is my code.

## Find the start and end infusion time for each infusion

data['infusion_start_time'] = data.groupby(['infusion_id'])['event_time'].transform('min').values
data['infusion_end_time'] = data.groupby(['infusion_id'])['event_time'].transform('max').values
data['event_time_difference'] = (data['event_time'] - data['infusion_start_time']).apply(lambda x: x/np.timedelta64(1, 's')).fillna(0).astype('int64')

## Find the previous row data for time difference calculation between each status change

data['event_time'] = pd.to_datetime(data['event_time'])
data['infusion_id'] = data['infusion_id'].astype(str) #floats are hard to compare
data['prev_event_time'] = data.groupby(['infusion_id'])['event_time'].shift(1).values
data['prev_event_status'] = data.groupby(['infusion_id'])['infusion_status'].shift(1).values

data['Flag'] = np.where((data['infusion_status'] == data['prev_event_status']), 'False', 'True')

## Create a new data frame to compute the difference between each status change

rslt_df = data[data['Flag'] == 'True'] 
rslt_df["status_time_difference"] = rslt_df['prev_event_time'].diff().apply(lambda x: x/np.timedelta64(1, 's')).fillna(0).astype('int64')
rslt_df.to_csv('status_time_sequence.csv')
Answered By: Karthik Venkatraman
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.