How to find the difference between two dates in years
Question:
I have two columns in my dataframe that I have converted into datetime. I’m trying to subtract these numbers and find the difference in years. This is the code I’m using:
from dateutil.relativedelta import relativedelta
difference_in_years = relativedelta(x['start'], x['end']).year
However, I’m getting the following error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What is the issue?
Answers:
Use attribute .years
with apply
and axis=1
for process by rows:
df = pd.DataFrame({'start':['2015-10-02','2014-11-05'],
'end':['2018-01-02','2018-10-05']})
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
from dateutil.relativedelta import relativedelta
df['y'] = df.apply(lambda x: relativedelta(x['end'], x['start']).years, axis=1)
Or use list comprehension
:
df['y'] = [relativedelta(i, j).years for i, j in zip(df['end'], df['start'])]
print (df)
start end y
0 2015-10-02 2018-01-02 2
1 2014-11-05 2018-10-05 3
EDIT:
df = pd.DataFrame({'start':['2015-10-02','2014-11-05'],
'end':['2018-01-02',np.nan]})
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
from dateutil.relativedelta import relativedelta
m = df[['start','end']].notnull().all(axis=1)
df.loc[m, 'y'] = df[m].apply(lambda x: relativedelta(x['end'], x['start']).years, axis=1)
print (df)
start end y
0 2015-10-02 2018-01-02 2.0
1 2014-11-05 NaT NaN
Check this answer calculate the difference between two datetime.date() dates in years and months
from dateutil import relativedelta as rdelta
from datetime import date
d1 = date(2001,5,1)
d2 = date(2012,1,1)
rd = rdelta.relativedelta(d2,d1)
rd
relativedelta(years=+10, months=+8)
You can do it by
(df['end'] - df['start'])/pd.Timedelta(1, 'Y')
and round the result if needed.
In pandas v0.23.4
and later you can do
(df['end'] - df['start'])//pd.Timedelta(1, 'Y')
to get whole year difference straight away.
Update: In panda v0.25
and later, pd.Timedelta(1, 'Y')
is not supported, because a year is not a consistent unit of measure (sometimes 365d, sometimes 366d). You can do this instead if the 365d approximation is acceptable:
(df['end'] - df['start'])/pd.Timedelta(365, 'D')
You can divide a timedelta
series by year units and, if necessary, round:
# data from jezrael
df['years'] = (df['end'] - df['start']) / np.timedelta64(1, 'Y')
df['years_floor'] = df['years'].round()
print(df)
start end years years_floor
0 2015-10-02 2018-01-02 2.253297 2.0
1 2014-11-05 NaT NaN NaN
I have two columns in my dataframe that I have converted into datetime. I’m trying to subtract these numbers and find the difference in years. This is the code I’m using:
from dateutil.relativedelta import relativedelta
difference_in_years = relativedelta(x['start'], x['end']).year
However, I’m getting the following error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What is the issue?
Use attribute .years
with apply
and axis=1
for process by rows:
df = pd.DataFrame({'start':['2015-10-02','2014-11-05'],
'end':['2018-01-02','2018-10-05']})
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
from dateutil.relativedelta import relativedelta
df['y'] = df.apply(lambda x: relativedelta(x['end'], x['start']).years, axis=1)
Or use list comprehension
:
df['y'] = [relativedelta(i, j).years for i, j in zip(df['end'], df['start'])]
print (df)
start end y
0 2015-10-02 2018-01-02 2
1 2014-11-05 2018-10-05 3
EDIT:
df = pd.DataFrame({'start':['2015-10-02','2014-11-05'],
'end':['2018-01-02',np.nan]})
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
from dateutil.relativedelta import relativedelta
m = df[['start','end']].notnull().all(axis=1)
df.loc[m, 'y'] = df[m].apply(lambda x: relativedelta(x['end'], x['start']).years, axis=1)
print (df)
start end y
0 2015-10-02 2018-01-02 2.0
1 2014-11-05 NaT NaN
Check this answer calculate the difference between two datetime.date() dates in years and months
from dateutil import relativedelta as rdelta
from datetime import date
d1 = date(2001,5,1)
d2 = date(2012,1,1)
rd = rdelta.relativedelta(d2,d1)
rd
relativedelta(years=+10, months=+8)
You can do it by
(df['end'] - df['start'])/pd.Timedelta(1, 'Y')
and round the result if needed.
In pandas v0.23.4
and later you can do
(df['end'] - df['start'])//pd.Timedelta(1, 'Y')
to get whole year difference straight away.
Update: In panda v0.25
and later, pd.Timedelta(1, 'Y')
is not supported, because a year is not a consistent unit of measure (sometimes 365d, sometimes 366d). You can do this instead if the 365d approximation is acceptable:
(df['end'] - df['start'])/pd.Timedelta(365, 'D')
You can divide a timedelta
series by year units and, if necessary, round:
# data from jezrael
df['years'] = (df['end'] - df['start']) / np.timedelta64(1, 'Y')
df['years_floor'] = df['years'].round()
print(df)
start end years years_floor
0 2015-10-02 2018-01-02 2.253297 2.0
1 2014-11-05 NaT NaN NaN