Pandas: Checking if a date is a holiday and assigning boolean value
Question:
I have a pandas data frame with date column, and I am trying to add a new column of boolean values indicating whether a given date is a holiday or not.
Following is the code, but it does not work (all the values are False) because the types seem to be different, and I can’t figure out how to get the ‘date’ in the pandas data frame to be of the same type as the holidays:
cal = USFederalHolidayCalendar()
holidays = cal.holidays(start=train_df['date'].min(),
end=train_df['date'].max()).to_pydatetime()
train_df['holiday'] = train_df['date'].isin(holidays)
print type(train_df['date'][1])
print type(holidays[0])
Answers:
You don’t need to convert anything. Just compare straight up. pandas
is smart enough to compare a lot of different types with regards to dates and times. You have to have a slightly more esoteric format if you’re having issues with date/time compatibility.
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar
dr = pd.date_range(start='2015-07-01', end='2015-07-31')
df = pd.DataFrame()
df['Date'] = dr
cal = calendar()
holidays = cal.holidays(start=dr.min(), end=dr.max())
df['Holiday'] = df['Date'].isin(holidays)
print df
Result:
Date Holiday
0 2015-07-01 False
1 2015-07-02 False
2 2015-07-03 True
3 2015-07-04 False
4 2015-07-05 False
5 2015-07-06 False
6 2015-07-07 False
7 2015-07-08 False
8 2015-07-09 False
9 2015-07-10 False
10 2015-07-11 False
11 2015-07-12 False
12 2015-07-13 False
13 2015-07-14 False
14 2015-07-15 False
15 2015-07-16 False
16 2015-07-17 False
17 2015-07-18 False
18 2015-07-19 False
19 2015-07-20 False
20 2015-07-21 False
21 2015-07-22 False
22 2015-07-23 False
23 2015-07-24 False
24 2015-07-25 False
25 2015-07-26 False
26 2015-07-27 False
27 2015-07-28 False
28 2015-07-29 False
29 2015-07-30 False
30 2015-07-31 False
Note that July 4, 2015 falls on a Saturday.
I had the same problem as the author, and the other fix provided didn’t work for me. This is what did work:
train_df['holiday'] = train_df['date'].dt.date.astype('datetime64').isin(holidays)
I have a pandas data frame with date column, and I am trying to add a new column of boolean values indicating whether a given date is a holiday or not.
Following is the code, but it does not work (all the values are False) because the types seem to be different, and I can’t figure out how to get the ‘date’ in the pandas data frame to be of the same type as the holidays:
cal = USFederalHolidayCalendar()
holidays = cal.holidays(start=train_df['date'].min(),
end=train_df['date'].max()).to_pydatetime()
train_df['holiday'] = train_df['date'].isin(holidays)
print type(train_df['date'][1])
print type(holidays[0])
You don’t need to convert anything. Just compare straight up. pandas
is smart enough to compare a lot of different types with regards to dates and times. You have to have a slightly more esoteric format if you’re having issues with date/time compatibility.
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as calendar
dr = pd.date_range(start='2015-07-01', end='2015-07-31')
df = pd.DataFrame()
df['Date'] = dr
cal = calendar()
holidays = cal.holidays(start=dr.min(), end=dr.max())
df['Holiday'] = df['Date'].isin(holidays)
print df
Result:
Date Holiday
0 2015-07-01 False
1 2015-07-02 False
2 2015-07-03 True
3 2015-07-04 False
4 2015-07-05 False
5 2015-07-06 False
6 2015-07-07 False
7 2015-07-08 False
8 2015-07-09 False
9 2015-07-10 False
10 2015-07-11 False
11 2015-07-12 False
12 2015-07-13 False
13 2015-07-14 False
14 2015-07-15 False
15 2015-07-16 False
16 2015-07-17 False
17 2015-07-18 False
18 2015-07-19 False
19 2015-07-20 False
20 2015-07-21 False
21 2015-07-22 False
22 2015-07-23 False
23 2015-07-24 False
24 2015-07-25 False
25 2015-07-26 False
26 2015-07-27 False
27 2015-07-28 False
28 2015-07-29 False
29 2015-07-30 False
30 2015-07-31 False
Note that July 4, 2015 falls on a Saturday.
I had the same problem as the author, and the other fix provided didn’t work for me. This is what did work:
train_df['holiday'] = train_df['date'].dt.date.astype('datetime64').isin(holidays)