Find end date for every value in a dataframe

Question:

I have a dataframe with 100 columns filled with start dates. I’m trying to find the next date for each value compared against that row to produce another data frame of end dates. If there is no next date it will add 1 day.

Start dates :

        | gas station | park     | beach    | store 
| Car A | 1/1/2022    | 1/4/2021 | 1/2/2021 | 1/3/2021
| Car B | 2/14/2021   | 2/10/2021| 2/21/2021| 2/5/2021

Stop dates:

        | gas station | park     | beach    | store 
| Car A | 1/2/2022    | 1/5/2021 | 1/3/2021 | 1/4/2021
| Car B | 2/21/2021   | 2/14/2021| 2/22/2021| 2/10/2021

Explanation : The “start dates” is the current dataframe. Car A arrived to the column name locations on the dates shown. Same with car B. I want to create a new dataframe (“stop dates”) based on the start dates. Car A gas station start date compared against all other columns to find the next greatest date. That next greatest date will populate the “stop date” dataframe for car A gas station, etc

Asked By: Plentyoftime

||

Answers:

You can write a custom function that takes in a row as an input, and returns the desired row as a pd.Series, then apply this function to each row using df.apply with the argument axis=1. Also I believe you may have made a typo with your start dates and the first entry should be from the year 2021 as well. Otherwise, the next date after 1/4/2021 in the park column in the same row would be 1/1/2022.

For example:

import numpy as np
import pandas as pd

## recreate your start dates dataframe
df_start = pd.DataFrame(
    columns=['gas station','park','beach','store'],
    data = [
        ['1/1/2021','1/4/2021','1/2/2021','1/3/2021'],
        ['2/14/2021','2/10/2021','2/21/2021','2/5/2021']
    ],
    index=['Car A', 'Car B']
)

for col in df_start.columns:
    df_start[col] = pd.to_datetime(df_start[col])

## custom function that takes in a row as input
## and outputs a row as a series
def get_stop_dates(row):
    sorted_dates = row.unique()
    sorted_dates.sort()
    new_row = []
    for d in row.values:
        idx = np.where(sorted_dates == d)[0][0]
        if idx == len(sorted_dates) - 1:
            new_date = pd.to_datetime(d) + pd.Timedelta("1d")
        else:
            new_date = pd.to_datetime(sorted_dates[idx+1])
        new_row.append(new_date)
    return pd.Series(new_row)

df_stop = df_start.apply(lambda row: get_stop_dates(row), axis=1)
df_stop.columns = df_start.columns

Input:

>>> df_start
      gas station       park      beach      store
Car A  2021-01-01 2021-01-04 2021-01-02 2021-01-03
Car B  2021-02-14 2021-02-10 2021-02-21 2021-02-05

Output:

>>> df_stop
      gas station       park      beach      store
Car A  2021-01-02 2021-01-05 2021-01-03 2021-01-04
Car B  2021-02-21 2021-02-14 2021-02-22 2021-02-10
Answered By: Derek O
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.