Pandas Timedelta in months
Question:
How can I calculate the elapsed months using pandas? I have write the following, but this code is not elegant. Could you tell me a better way?
import pandas as pd
df = pd.DataFrame([pd.Timestamp('20161011'),
pd.Timestamp('20161101') ], columns=['date'])
df['today'] = pd.Timestamp('20161202')
df = df.assign(
elapsed_months=(12 *
(df["today"].map(lambda x: x.year) -
df["date"].map(lambda x: x.year)) +
(df["today"].map(lambda x: x.month) -
df["date"].map(lambda x: x.month))))
# Out[34]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
Answers:
The following will accomplish this:
df["elapsed_months"] = ((df["today"] - df["date"]).
map(lambda x: round(x.days/30)))
# Out[34]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
Update for pandas 0.24.0:
Since 0.24.0 has changed the api to return MonthEnd object from period subtraction, you could do some manual calculation as follows to get the whole month difference:
12 * (df.today.dt.year - df.date.dt.year) + (df.today.dt.month - df.date.dt.month)
# 0 2
# 1 1
# dtype: int64
Wrap in a function:
def month_diff(a, b):
return 12 * (a.dt.year - b.dt.year) + (a.dt.month - b.dt.month)
month_diff(df.today, df.date)
# 0 2
# 1 1
# dtype: int64
Prior to pandas 0.24.0. You can round the date to Month with to_period()
and then subtract the result:
df['elapased_months'] = df.today.dt.to_period('M') - df.date.dt.to_period('M')
df
# date today elapased_months
#0 2016-10-11 2016-12-02 2
#1 2016-11-01 2016-12-02 1
you could also try:
df['months'] = (df['today'] - df['date']) / np.timedelta64(1, 'M')
df
# date today months
#0 2016-10-11 2016-12-02 1.708454
#1 2016-11-01 2016-12-02 1.018501
In a simpler way, it can also be calculated using the to_period function in pandas.
pd.to_datetime('today').to_period('M') - pd.to_datetime('2020-01-01').to_period('M')
# [Out]:
# <7 * MonthEnds>
In case, you just want the integer value just use (<above_code>).n
Update for pandas 1.3
If you want integers instead of MonthEnd
objects:
df['elapsed_months'] = df.today.dt.to_period('M').view(dtype='int64') - df.date.dt.to_period('M').view(dtype='int64')
df
# Out[11]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
This works with pandas 1.1.1:
df['elapsed_months'] = df.today.dt.to_period('M').astype(int) - df.date.dt.to_period('M').astype(int)
df
# Out[11]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
Use can use .n
to get the number of months as an integer:
(pd.to_datetime('today').to_period('M') - pd.to_datetime('2020-01-01').to_period('M')).n
On a dataframe, you can use it with .apply
:
df["n_months"] = (df["date1"].dt.to_period("M") - df["date2"].dt.to_period("M")).apply(lambda x: x.n)
Also takes care of pandas 1.3.2 int conversion issue and any rounding issues with converting to ints earlier.
If you don’t mind ignoring the days, you can use numpy functionality:
import numpy as np
df['elapsed month'] = (df.date.values.astype('datetime64[M]')-
df.today.values.astype('datetime64[M]'))
/ np.timedelta64(1,'M')
How can I calculate the elapsed months using pandas? I have write the following, but this code is not elegant. Could you tell me a better way?
import pandas as pd
df = pd.DataFrame([pd.Timestamp('20161011'),
pd.Timestamp('20161101') ], columns=['date'])
df['today'] = pd.Timestamp('20161202')
df = df.assign(
elapsed_months=(12 *
(df["today"].map(lambda x: x.year) -
df["date"].map(lambda x: x.year)) +
(df["today"].map(lambda x: x.month) -
df["date"].map(lambda x: x.month))))
# Out[34]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
The following will accomplish this:
df["elapsed_months"] = ((df["today"] - df["date"]).
map(lambda x: round(x.days/30)))
# Out[34]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
Update for pandas 0.24.0:
Since 0.24.0 has changed the api to return MonthEnd object from period subtraction, you could do some manual calculation as follows to get the whole month difference:
12 * (df.today.dt.year - df.date.dt.year) + (df.today.dt.month - df.date.dt.month)
# 0 2
# 1 1
# dtype: int64
Wrap in a function:
def month_diff(a, b):
return 12 * (a.dt.year - b.dt.year) + (a.dt.month - b.dt.month)
month_diff(df.today, df.date)
# 0 2
# 1 1
# dtype: int64
Prior to pandas 0.24.0. You can round the date to Month with to_period()
and then subtract the result:
df['elapased_months'] = df.today.dt.to_period('M') - df.date.dt.to_period('M')
df
# date today elapased_months
#0 2016-10-11 2016-12-02 2
#1 2016-11-01 2016-12-02 1
you could also try:
df['months'] = (df['today'] - df['date']) / np.timedelta64(1, 'M')
df
# date today months
#0 2016-10-11 2016-12-02 1.708454
#1 2016-11-01 2016-12-02 1.018501
In a simpler way, it can also be calculated using the to_period function in pandas.
pd.to_datetime('today').to_period('M') - pd.to_datetime('2020-01-01').to_period('M')
# [Out]:
# <7 * MonthEnds>
In case, you just want the integer value just use (<above_code>).n
Update for pandas 1.3
If you want integers instead of MonthEnd
objects:
df['elapsed_months'] = df.today.dt.to_period('M').view(dtype='int64') - df.date.dt.to_period('M').view(dtype='int64')
df
# Out[11]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
This works with pandas 1.1.1:
df['elapsed_months'] = df.today.dt.to_period('M').astype(int) - df.date.dt.to_period('M').astype(int)
df
# Out[11]:
# date today elapsed_months
# 0 2016-10-11 2016-12-02 2
# 1 2016-11-01 2016-12-02 1
Use can use .n
to get the number of months as an integer:
(pd.to_datetime('today').to_period('M') - pd.to_datetime('2020-01-01').to_period('M')).n
On a dataframe, you can use it with .apply
:
df["n_months"] = (df["date1"].dt.to_period("M") - df["date2"].dt.to_period("M")).apply(lambda x: x.n)
Also takes care of pandas 1.3.2 int conversion issue and any rounding issues with converting to ints earlier.
If you don’t mind ignoring the days, you can use numpy functionality:
import numpy as np
df['elapsed month'] = (df.date.values.astype('datetime64[M]')-
df.today.values.astype('datetime64[M]'))
/ np.timedelta64(1,'M')