Is there an equivalent for the SAS "intnx" command in python?

Question:

I am a beginner in python and stuck on translating a piece of code from SAS to Python (see below). How would I recreate the "intnx"-command in line 10 in Python, which introduces a 3-month-lag to the actual date (datadate) in order to create the variable "HouDate".

data data._1_compustat_prep (keep = gvkey tic datadate fyear HouDate HouYear ib at dvc act che 
lct dlc txp dp ibc oancf);
retain gvkey tic datadate fyear HouDate HouYear;
set data._0_compustat_students;

  if indfmt   = 'INDL' and datafmt = 'STD';

  format HouDate DDMMYY10.;
  format DataDate DDMMYY10.;
  HouDate = intnx("month", datadate, +3, "E");
  if month(HouDate) >= 7 then HouYear = year(HouDate) + 1; else HouYear = year(HouDate);

  if mdy(07,01,1963) <= HouDate <= mdy(06,30,2009);
  if exchg in (11,12,14,15,17);
  if 6000 <= sich <= 6999 then delete;

run; 
Asked By: ChurrrT

||

Answers:

You can use the datetime and [dateutil] libraries to help you with this. In particular the datetime.datetime and datetime.timedelta dateutil.realtivedelta.relativedelta classes.
The dateutil.relativedelta.relativedelta class is better than the datetime.timedelta here, because the latter won’t allow you to specify a delta with units bigger than days, when the former does.

Additionally, I’ve added type hints in the code, to make it easier to understand. I corresponds to the syntax xx:yy where xx is the variable name, and yy its type.

from datetime import datetime
from dateutil.relativedelta import relativedelta

lag:relativedelta = relativedelta(months=3)
datadate:datetime = datetime(year=2022, month=8, day=25, hour=11, minute=37, second=8)
HouDate:datetime = datadate + lag

HouYear:int = HouDate.year
if HouDate.month >= 7:
    HouYear += 1

print("{} || {}".format(HouYear, HouDate))

2023 || 2022-11-25 11:37:08


As the author mentioned in one of their comment:

how would you do the commands above for whole columns in a dataframe?

Here is the way I’d do it when using a pandas.DataFrame:

from datetime import datetime
from dateutil.relativedelta import relativedelta
import pandas as pd

lag:relativedelta = relativedelta(months=3)

df:pd.DataFrame = pd.DataFrame({
    "datadate": [
        datetime(year=2022, month=1, day=1, hour=1, minute=1, second=1),
        datetime(year=2022, month=6, day=2, hour=2, minute=2, second=2),
        datetime(year=2022, month=10, day=3, hour=3, minute=3, second=3)
    ]
})

df["HouDate"] = df.datadate.apply(lambda x: x+lag)
df["HouYear"] = df.HouDate.apply(lambda x: x.year if x.month < 7 else x.year+1)

print(df)

Pre-treatment dataframe:

datadate
0 2022-01-01 01:01:01
1 2022-06-02 02:02:02
2 2022-10-03 03:03:03

Post-treatment dataframe:

datadate HouDate HouYear
0 2022-01-01 01:01:01 2022-04-01 01:01:01 2022
1 2022-06-02 02:02:02 2022-09-02 02:02:02 2023
2 2022-10-03 03:03:03 2023-01-03 03:03:03 2023
Answered By: GregoirePelegrin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.