How to iterate recursevly a df and calculate row value
Question:
I have the following df
list_columns = ['id','start', 'end', 'duration']
list_data = [
[1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', 0, 2],[3,'0', 0, 3],[4,'0', 0, 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)
For a specific id, I want to calculate the start & end if they are 0 like this:
For example if id=3 we check if start is 0, if so, then
start = end of the previous id from the df
end = start + duration
But if id=4, how can we check each above row of the df in a simple way and calculate the start & end ?
Answers:
No recursion is required. You need to convert all your dates to Datetime format (which means first replacing ‘0’ with some date value) then filling in the blanks dates from the previous row or duration. There is probably a neater way but you can use:
import pandas as pd
import datetime
dummy = '1800-01-01'
dummy_date = pd.to_datetime(dummy)
list_columns = ['id','start', 'end', 'duration']
list_data = [
[1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', '0', 2],[3,'0', '0', 3],[4,'0', '0', 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)
#replace '0' date with dummy value then convert all to datetime
df['start'] = df['start'].where(df['start'] != '0', dummy)
df['start'] = pd.to_datetime(df['start'], format = '%Y-%m-%d')
df['end'] = df['end'].where(df['end'] != '0', dummy)
df['end'] = pd.to_datetime(df['end'], format = '%Y-%m-%d')
for row in df.itertuples():
if row.start == dummy_date:
df.loc[row.Index, 'start'] = df.loc[row.Index -1, 'end']
if row.end == dummy_date:
df.loc[row.Index, 'end'] = df.loc[row.Index, 'start'] + pd.Timedelta(row.duration, unit = 'days')
print(df)
which gives:
id start end duration
0 1 2023-01-01 2023-04-02 0
1 2 2023-01-10 2023-01-12 2
2 3 2023-01-12 2023-01-15 3
3 4 2023-01-15 2023-01-19 4
I have the following df
list_columns = ['id','start', 'end', 'duration']
list_data = [
[1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', 0, 2],[3,'0', 0, 3],[4,'0', 0, 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)
For a specific id, I want to calculate the start & end if they are 0 like this:
For example if id=3 we check if start is 0, if so, then
start = end of the previous id from the df
end = start + duration
But if id=4, how can we check each above row of the df in a simple way and calculate the start & end ?
No recursion is required. You need to convert all your dates to Datetime format (which means first replacing ‘0’ with some date value) then filling in the blanks dates from the previous row or duration. There is probably a neater way but you can use:
import pandas as pd
import datetime
dummy = '1800-01-01'
dummy_date = pd.to_datetime(dummy)
list_columns = ['id','start', 'end', 'duration']
list_data = [
[1,'2023-01-01', '2023-04-02', 0], [2,'2023-01-10', '0', 2],[3,'0', '0', 3],[4,'0', '0', 4]]
df= pd.DataFrame(columns=list_columns, data=list_data)
#replace '0' date with dummy value then convert all to datetime
df['start'] = df['start'].where(df['start'] != '0', dummy)
df['start'] = pd.to_datetime(df['start'], format = '%Y-%m-%d')
df['end'] = df['end'].where(df['end'] != '0', dummy)
df['end'] = pd.to_datetime(df['end'], format = '%Y-%m-%d')
for row in df.itertuples():
if row.start == dummy_date:
df.loc[row.Index, 'start'] = df.loc[row.Index -1, 'end']
if row.end == dummy_date:
df.loc[row.Index, 'end'] = df.loc[row.Index, 'start'] + pd.Timedelta(row.duration, unit = 'days')
print(df)
which gives:
id start end duration
0 1 2023-01-01 2023-04-02 0
1 2 2023-01-10 2023-01-12 2
2 3 2023-01-12 2023-01-15 3
3 4 2023-01-15 2023-01-19 4