How to iterate recursevly a df and calculate row value

Question:

I have the following df

list_columns = ['id','start', 'end', 'duration']
list_data = [
    [1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', 0, 2],[3,'0', 0, 3],[4,'0', 0, 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)

For a specific id, I want to calculate the start & end if they are 0 like this:

For example if id=3 we check if start is 0, if so, then

start = end of the previous id from the df
end = start  + duration

But if id=4, how can we check each above row of the df in a simple way and calculate the start & end ?

Asked By: user3619789

||

Answers:

No recursion is required. You need to convert all your dates to Datetime format (which means first replacing ‘0’ with some date value) then filling in the blanks dates from the previous row or duration. There is probably a neater way but you can use:

import pandas as pd
import datetime

dummy = '1800-01-01'
dummy_date = pd.to_datetime(dummy)

list_columns = ['id','start', 'end', 'duration']
list_data = [
    [1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', '0', 2],[3,'0', '0', 3],[4,'0', '0', 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)

#replace '0' date with dummy value then convert all to datetime
df['start'] = df['start'].where(df['start'] != '0', dummy)
df['start'] = pd.to_datetime(df['start'], format = '%Y-%m-%d')
df['end'] = df['end'].where(df['end'] != '0', dummy)
df['end'] = pd.to_datetime(df['end'], format = '%Y-%m-%d')

for row in df.itertuples():
    if row.start == dummy_date:
        df.loc[row.Index, 'start'] = df.loc[row.Index -1, 'end']
    if row.end == dummy_date:
        df.loc[row.Index, 'end'] = df.loc[row.Index, 'start'] + pd.Timedelta(row.duration, unit = 'days')
        
print(df)

which gives:

   id      start        end  duration
0   1 2023-01-01 2023-04-02         0
1   2 2023-01-10 2023-01-12         2
2   3 2023-01-12 2023-01-15         3
3   4 2023-01-15 2023-01-19         4
Answered By: user19077881
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.