Create extra rows using date column pandas dataframe

Question:

Imagine I have the following data:

ID  Leave Type  Start Date    End Date
1   Sick        2022-01-01    2022-01-01
1   Holiday     2023-03-28  
2   Holiday     2023-01-01    2023-01-02
3   Work        2023-01-01    2023-01-01

I need to find a way to confirm Start Date and End Date have the same value. In case it is not, it needs to count the number of days the End Date is ahead and, for each day, create a row adding 1 day and always matching Start Date and End Date. If End Date is blank, it should create rows until it reaches the day of 2023-03-30. This way resulting on this data:

ID  Leave Type  Start Date    End Date
1   Sick        2022-01-01    2022-01-01
1   Holiday     2023-03-28    2023-03-28
1   Holiday     2023-03-29    2023-03-29
1   Holiday     2023-03-30    2023-03-30
1   Holiday     2023-03-31    2023-03-31
2   Holiday     2023-01-01    2023-01-01
2   Holiday     2023-01-02    2023-01-02
3   Work        2023-01-01    2023-01-01

Thank you!

Asked By: Paulo Cortez

||

Answers:

You can use:

# ensure datetime and fill NA with default date
df[['Start Date', 'End Date']] = df[['Start Date', 'End Date']].apply(pd.to_datetime)
df['End Date'] = df['End Date'].fillna('2023-03-30')

# repeat index and create output
idx = df.index.repeat(df['End Date'].sub(df['Start Date']).dt.days.add(1))
out = df.loc[idx]

# increment days
out['Start Date'] += pd.TimedeltaIndex(out.groupby(level=0).cumcount(), unit='D')
out['End Date'] = out['Start Date']

Output:

   ID Leave Type Start Date   End Date
0   1       Sick 2022-01-01 2022-01-01
1   1    Holiday 2023-03-28 2023-03-28
1   1    Holiday 2023-03-29 2023-03-29
1   1    Holiday 2023-03-30 2023-03-30
2   2    Holiday 2023-01-01 2023-01-01
2   2    Holiday 2023-01-02 2023-01-02
3   3       Work 2023-01-01 2023-01-01

Reproducible input:

df = pd.DataFrame({'ID': [1, 1, 2, 3],
                   'Leave Type': ['Sick', 'Holiday', 'Holiday', 'Work'],
                   'Start Date': ['2022-01-01', '2023-03-28', '2023-01-01', '2023-01-01'],
                   'End Date': ['2022-01-01', None, '2023-01-02', '2023-01-01']})
Answered By: mozway

Assuming that you incorrectly pasted an extra row(5th row) in the output. You can try this as well:

import pandas as pd
from datetime import timedelta, datetime

# create the dataframe
df = pd.DataFrame({'ID': [1, 1, 2, 3], 
                   'Leave Type': ['Sick', 'Holiday', 'Holiday', 'Work'], 
                   'Start Date': ['2022-01-01', '2023-03-28', '2023-01-01', '2023-01-01'], 
                   'End Date': ['2022-01-01', '', '2023-01-02', '2023-01-01']})

# convert date columns to datetime format
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['End Date'] = pd.to_datetime(df['End Date'])

# fill in blank end dates with a maximum date value
df['End Date'] = df['End Date'].fillna(datetime(2023, 3, 30))

# create an empty list to store new rows
create_rows = []

# loop through each row in the dataframe
for index, row in df.iterrows():
    
    # if the start and end dates are not the same, add rows for each day in between
    if row['Start Date'] != row['End Date']:
        delta = row['End Date'] - row['Start Date']
        for i in range(delta.days + 1):
            date = row['Start Date'] + timedelta(days=i)
            create_rows.append({'ID': row['ID'], 'Leave Type': row['Leave Type'], 'Start Date': date, 'End Date': date})
    
    # if the start and end dates are the same, append the original row
    else:
        create_rows.append({'ID': row['ID'], 'Leave Type': row['Leave Type'], 'Start Date': row['Start Date'], 'End Date': row['End Date']})
    
# create a new dataframe with the original rows and the new rows
output_df = pd.DataFrame(create_rows)

# sort the dataframe by ID and Start Date
output_df = output_df.sort_values(['ID', 'Start Date'])

# reset the index
output_df = output_df.reset_index(drop=True)

print(output_df)
Answered By: warwick12
import pandas as pd
from pandas.tseries.offsets import MonthEnd

df = pd.DataFrame({'Leave Type': ['Sick', 'Holiday', 'Holiday', 'Work'],
                   'Start Date': ['2022-01-01', '2023-03-28', '2023-01-01', '2023-01-01'],
                   'End Date': ['2022-01-01', '', '2023-01-02', '2023-01-01'],
                   })
# Converts columns 'Leave Type' and 'Start Date' to datetime
df[['Start Date', 'End Date']] = 
    df[['Start Date', 'End Date']].apply(pd.to_datetime, errors='coerce')
# Fill NaT values with the last day of the month
df['End Date'] = df['End Date'].fillna(df['Start Date'] + MonthEnd(0))
# Replace 'Start Date' values with list of date ranges
df['End Date'] = 
    [pd.date_range(s, e, freq='D').tolist() for s,e in zip(df['Start Date'], df['End Date'])]
# Explode the list
df = df.explode('End Date')

df['Start Date'] = df['End Date']

print(df)

Result

  Leave Type Start Date   End Date
0       Sick 2022-01-01 2022-01-01
1    Holiday 2023-03-28 2023-03-28
1    Holiday 2023-03-29 2023-03-29
1    Holiday 2023-03-30 2023-03-30
1    Holiday 2023-03-31 2023-03-31
2    Holiday 2023-01-01 2023-01-01
2    Holiday 2023-01-02 2023-01-02
3       Work 2023-01-01 2023-01-01
Answered By: Laurent B.
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.