Python pandas add a years integer column to a date column

Question:

I have a question somehow similar to what discussed here How to add a year to a column of dates in pandas
however in my case, the number of years to add to the date column is stored in another column. This is my not working code:

import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]],  columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_Date'] = df1['Date'] + pd.offsets.DateOffset(years=df1['Years'])

The goal is to add 5 years to the current date for row 1, 3 years to current date in row 2, eccetera.
Any suggestions? Thank you

Asked By: Angelo

||

Answers:

import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]],  columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_date'] = datetime.date.today()

df1['Final_date'] = df1.apply(lambda g: g['Date'] + pd.offsets.DateOffset(years = g['Years']), axis=1)


print(df1)

Try this, you were trying to add the whole column when you called pd.offsets.DateOffset(years=df1['Years']) instead of just 1 value in the column.

EDIT: I changed from iterrows to a vectorization method due to iterrows‘s poor performance

Answered By: tbessho

Assuming the number of different values in Years is limited, you can try groupby and do the operation with pd.DateOffset like:

df1['new_date'] = (
    df1.groupby('Years')
       ['Date'].apply(lambda x: x + pd.DateOffset(years=x.name))
)
print(df1)
    Name  Years        Date   new_date
0    Tom      5  2021-07-13 2026-07-13
1   Jane      3  2021-07-13 2024-07-13
2  Peter      1  2021-07-13 2022-07-13

else you can extract year, month and day, add the Years column to year and recreate a datetime column

df1['Date'] = pd.to_datetime(df1['Date'])
df1['new_date'] = (
    df1.assign(year=lambda x: x['Date'].dt.year+x['Years'], 
               month=lambda x: x['Date'].dt.month,
               day=lambda x: x['Date'].dt.day, 
               new_date=lambda x: pd.to_datetime(x[['year','month','day']]))
       ['new_date']
)

same result

Answered By: Ben.T

Convert to time delta by converting years to days, then adding to a converted datetime column:

df1['Final_Date'] = pd.to_datetime(df1['Date']) 
    + pd.to_timedelta(df1['Years'] * 365, unit='D')

Use of to_timedelta with unit='Y' for years is deprecated and throws ValueError.

Edit. If you need day-exact changes, you will need to go row-by-row and update the date objects accordingly. Other answers explain.

Answered By: ifly6