Subtract a year from a datetime column in pandas
Question:
I have a datetime column as below –
>>> df['ACC_DATE'].head(2)
538 2006-04-07
550 2006-04-12
Name: ACC_DATE, dtype: datetime64[ns]
Now, I want to subtract an year from each row of this column. How can I achieve the same & which library can I use?
The expected field –
ACC_DATE NEW_DATE
538 2006-04-07 2005-04-07
549 2006-04-12 2005-04-12
Answers:
You could use pd.Timedelta:
df["NEW_DATE"] = df["ACC_DATE"] - pd.Timedelta(days=365)
Or replace:
df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x.replace(year=x.year - 1))
But neither will catch leap years so you could use dateutil.relativedelta
:
from dateutil.relativedelta import relativedelta
df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x - relativedelta(years=1))
You can use DateOffset
to achieve this:
In[88]:
df['NEW_DATE'] = df['ACC_DATE'] - pd.DateOffset(years=1)
df
Out[88]:
ACC_DATE NEW_DATE
index
538 2006-04-07 2005-04-07
550 2006-04-12 2005-04-12
Use DateOffset:
df["NEW_DATE"] = df["ACC_DATE"] - pd.offsets.DateOffset(years=1)
print (df)
ACC_DATE NEW_DATE
index
538 2006-04-07 2005-04-07
550 2006-04-12 2005-04-12
If having a single pd.Timestamp
object rather than a column,
- Using
pd.DateOffset(years=n)
is not ideal as it produces:
UserWarning: Discarding nonzero nanoseconds in conversion
pd.Timedelta()
doesn’t accept years.
The only approach that worked for me in this case is pd.Timestamp.replace
:
t = pd.Timestamp.now()
t = t.replace(year=t.year - n)
This was hinted at in the answer by Padriac but it needed further clarity.
I have a datetime column as below –
>>> df['ACC_DATE'].head(2)
538 2006-04-07
550 2006-04-12
Name: ACC_DATE, dtype: datetime64[ns]
Now, I want to subtract an year from each row of this column. How can I achieve the same & which library can I use?
The expected field –
ACC_DATE NEW_DATE
538 2006-04-07 2005-04-07
549 2006-04-12 2005-04-12
You could use pd.Timedelta:
df["NEW_DATE"] = df["ACC_DATE"] - pd.Timedelta(days=365)
Or replace:
df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x.replace(year=x.year - 1))
But neither will catch leap years so you could use dateutil.relativedelta
:
from dateutil.relativedelta import relativedelta
df["NEW_DATE"] = df["ACC_DATE"].apply(lambda x: x - relativedelta(years=1))
You can use DateOffset
to achieve this:
In[88]:
df['NEW_DATE'] = df['ACC_DATE'] - pd.DateOffset(years=1)
df
Out[88]:
ACC_DATE NEW_DATE
index
538 2006-04-07 2005-04-07
550 2006-04-12 2005-04-12
Use DateOffset:
df["NEW_DATE"] = df["ACC_DATE"] - pd.offsets.DateOffset(years=1)
print (df)
ACC_DATE NEW_DATE
index
538 2006-04-07 2005-04-07
550 2006-04-12 2005-04-12
If having a single pd.Timestamp
object rather than a column,
- Using
pd.DateOffset(years=n)
is not ideal as it produces:
UserWarning: Discarding nonzero nanoseconds in conversion
pd.Timedelta()
doesn’t accept years.
The only approach that worked for me in this case is pd.Timestamp.replace
:
t = pd.Timestamp.now()
t = t.replace(year=t.year - n)
This was hinted at in the answer by Padriac but it needed further clarity.