Expand pandas dataframe date ranges to individual rows

Question:

I have to expand a pandas dataframe based on start date and end date, into individual rows.

Original dataframe is as below

ORIGINAL DATAFRAME

My final dataframe should be repeated for each day between start and end date of individual rows.The result needs to be expanded for each date while the other columns except ‘startdate’ and ‘enddate’ is preserved.

For instance , the first row with startdate = 01-Jan-20 and enddate 15-Jan-20 should be expanded as 15 seperate rows representing one date in the series as shown in the sample resultant datafarame here:

EXPECTED RESULTANT DATAFRAME

I tried with the solution of itertuples to iterate over the dataframe and break the ranges to individual dates, but the solution is slow when the size of the dataframe is large.

Any optimal solution on this is highly appreciated.

Asked By: NeatCoder

||

Answers:

Use pandas.date_range in a list comprehension ,then use DataFrame.explode (you need to be using at least pandas v 0.25.0 for the explode method):

# Minimal example setup
df = pd.DataFrame({
    'TRIPNAME': ['HIGHSEASON', 'LOWSEASON', 'MEDSEASON'],
    'TRIPCAT': ['H', 'L', 'M'],
    'STARTDATE' : ['01JAN20', '16SEP20', '29JAN20'],
    'ENDDATE': ['15JAN20', '30NOV20', '19JUL20'],
    'FARE': [345, 280, 250]
})


df['DATE'] = [pd.date_range(s, e, freq='d') for s, e in
              zip(pd.to_datetime(df['STARTDATE']),
                  pd.to_datetime(df['ENDDATE']))]

df = df.explode('DATE').drop(['STARTDATE', 'ENDDATE'], axis=1)

print(df)

[out]

      TRIPNAME TRIPCAT  FARE       DATE
0   HIGHSEASON       H   345 2020-01-01
0   HIGHSEASON       H   345 2020-01-02
0   HIGHSEASON       H   345 2020-01-03
0   HIGHSEASON       H   345 2020-01-04
0   HIGHSEASON       H   345 2020-01-05
..         ...     ...   ...        ...
2    MEDSEASON       M   250 2020-07-15
2    MEDSEASON       M   250 2020-07-16
2    MEDSEASON       M   250 2020-07-17
2    MEDSEASON       M   250 2020-07-18
2    MEDSEASON       M   250 2020-07-19

[264 rows x 4 columns]
Answered By: Chris Adams