Python pandas add a years integer column to a date column
Question:
I have a question somehow similar to what discussed here How to add a year to a column of dates in pandas
however in my case, the number of years to add to the date column is stored in another column. This is my not working code:
import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]], columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_Date'] = df1['Date'] + pd.offsets.DateOffset(years=df1['Years'])
The goal is to add 5 years to the current date for row 1, 3 years to current date in row 2, eccetera.
Any suggestions? Thank you
Answers:
import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]], columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_date'] = datetime.date.today()
df1['Final_date'] = df1.apply(lambda g: g['Date'] + pd.offsets.DateOffset(years = g['Years']), axis=1)
print(df1)
Try this, you were trying to add the whole column when you called pd.offsets.DateOffset(years=df1['Years'])
instead of just 1 value in the column.
EDIT: I changed from iterrows
to a vectorization method due to iterrows
‘s poor performance
Assuming the number of different values in Years is limited, you can try groupby
and do the operation with pd.DateOffset
like:
df1['new_date'] = (
df1.groupby('Years')
['Date'].apply(lambda x: x + pd.DateOffset(years=x.name))
)
print(df1)
Name Years Date new_date
0 Tom 5 2021-07-13 2026-07-13
1 Jane 3 2021-07-13 2024-07-13
2 Peter 1 2021-07-13 2022-07-13
else you can extract year, month and day, add the Years column to year and recreate a datetime column
df1['Date'] = pd.to_datetime(df1['Date'])
df1['new_date'] = (
df1.assign(year=lambda x: x['Date'].dt.year+x['Years'],
month=lambda x: x['Date'].dt.month,
day=lambda x: x['Date'].dt.day,
new_date=lambda x: pd.to_datetime(x[['year','month','day']]))
['new_date']
)
same result
Convert to time delta by converting years to days, then adding to a converted datetime column:
df1['Final_Date'] = pd.to_datetime(df1['Date'])
+ pd.to_timedelta(df1['Years'] * 365, unit='D')
Use of to_timedelta
with unit='Y'
for years is deprecated and throws ValueError
.
Edit. If you need day-exact changes, you will need to go row-by-row and update the date objects accordingly. Other answers explain.
I have a question somehow similar to what discussed here How to add a year to a column of dates in pandas
however in my case, the number of years to add to the date column is stored in another column. This is my not working code:
import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]], columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_Date'] = df1['Date'] + pd.offsets.DateOffset(years=df1['Years'])
The goal is to add 5 years to the current date for row 1, 3 years to current date in row 2, eccetera.
Any suggestions? Thank you
import datetime
import pandas as pd
df1 = pd.DataFrame( [ ["Tom",5], ['Jane',3],['Peter',1]], columns = ["Name","Years"])
df1['Date'] = datetime.date.today()
df1['Final_date'] = datetime.date.today()
df1['Final_date'] = df1.apply(lambda g: g['Date'] + pd.offsets.DateOffset(years = g['Years']), axis=1)
print(df1)
Try this, you were trying to add the whole column when you called pd.offsets.DateOffset(years=df1['Years'])
instead of just 1 value in the column.
EDIT: I changed from iterrows
to a vectorization method due to iterrows
‘s poor performance
Assuming the number of different values in Years is limited, you can try groupby
and do the operation with pd.DateOffset
like:
df1['new_date'] = (
df1.groupby('Years')
['Date'].apply(lambda x: x + pd.DateOffset(years=x.name))
)
print(df1)
Name Years Date new_date
0 Tom 5 2021-07-13 2026-07-13
1 Jane 3 2021-07-13 2024-07-13
2 Peter 1 2021-07-13 2022-07-13
else you can extract year, month and day, add the Years column to year and recreate a datetime column
df1['Date'] = pd.to_datetime(df1['Date'])
df1['new_date'] = (
df1.assign(year=lambda x: x['Date'].dt.year+x['Years'],
month=lambda x: x['Date'].dt.month,
day=lambda x: x['Date'].dt.day,
new_date=lambda x: pd.to_datetime(x[['year','month','day']]))
['new_date']
)
same result
Convert to time delta by converting years to days, then adding to a converted datetime column:
df1['Final_Date'] = pd.to_datetime(df1['Date'])
+ pd.to_timedelta(df1['Years'] * 365, unit='D')
Use of to_timedelta
with unit='Y'
for years is deprecated and throws ValueError
.
Edit. If you need day-exact changes, you will need to go row-by-row and update the date objects accordingly. Other answers explain.