Create new column with 7th business day of the month based on year and month columns

Question:

I have a dataframe with two columns, the year and month of another variable which I left out for simplicity. It looks like this:

      YearOfSRC MonthNumberOfSRC
0       2022          3
1       2022          4
2       2022          5
3       2022          6
4       2021          4
... ... ...
20528   2022          1
20529   2022          2
20530   2022          3
20531   2022          4
20532   2022          5

I’d like to create a new column called PredictionDate that contains the date of the 7th business day of that month in that year in the format of YYYY-MM-DD. It seems like I could do something like this, but I’m not sure how to implement in a dataframe with the months and years in separate columns. I am sure it will involve an apply and/or groupby but again, not sure how to approach. Would I need to first create a helper column with the month and year combined, then use that to generate the desired output column?

EDIT The solutions given do work. However, I realized that some months’ 7th business days would be offset by holidays that land in the first week of the month, like New Year’s and July 4th. Therefore, I have posted a new question seeking how to incorporate this fact into the solution here.

Asked By: Julien

||

Answers:

If I understand you correctly you can convert the two columns to datetime format and add pd.tseries.offsets.BusinessDay(n = 7):

df["PredictionDate"] = (
    pd.to_datetime(
        df["YearOfSRC"].astype(str)
        + "-"
        + df["MonthNumberOfSRC"].astype(str)
        + "-01",
        dayfirst=False,
    )
    + pd.tseries.offsets.BusinessDay(n=7)
)

print(df)

Prints:

   YearOfSRC  MonthNumberOfSRC PredictionDate
0       2022                 3     2022-03-10
1       2022                 4     2022-04-12
2       2022                 5     2022-05-10
3       2022                 6     2022-06-10
4       2021                 4     2021-04-12
Answered By: Andrej Kesely

If I understand correctly, you can use pandas.to_datetime to convert to datetime, then pandas.offsets.BusinessDay to get your business day:

df['PredictionDate'] = (pd
 .to_datetime(df[['YearOfSRC', 'MonthNumberOfSRC']]
                .set_axis(['year' ,'month'], axis=1)
                .assign(day=1)
              )
 .sub(pd.offsets.BusinessDay(1))
 .add(pd.offsets.BusinessDay(7))
)

output:

       YearOfSRC  MonthNumberOfSRC PredictionDate
0           2022                 3     2022-03-09
1           2022                 4     2022-04-11
2           2022                 5     2022-05-10
3           2022                 6     2022-06-09
4           2021                 4     2021-04-09
20528       2022                 1     2022-01-11
20529       2022                 2     2022-02-09
20530       2022                 3     2022-03-09
20531       2022                 4     2022-04-11
20532       2022                 5     2022-05-10
Answered By: mozway