Convert a DataFrame with Periods ("from" and "to" date columns) to a Series
Question:
I have a DataFrame with school holidays. They have a "from" and "to" date column. Can you provide me with a neat and short way to convert it to a "is_holiday" Series for every day?
I have:
idx
From
To
Name
0
2017-12-25
2018-01-05
Xmas holiday
1
2018-02-12
2018-02-23
Sport holidy
2
2018-03-29
2018-04-02
Easter holiday
…
I want:
Date
is_holiday
..
2017-12-24
False
2017-12-25
True
2017-12-26
True
..
2018-01-04
False
2018-01-05
True
..
and so on..
…
Example DataFrame for your convenience:
import pandas as pd
df = pd.DataFrame({
"From": ["2017-12-25", "2018-02-12", "2018-03-29"],
"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From)
df.To = pd.to_datetime(df.To)
Answers:
This range all dates from the lowest From
to the highest To
, but you can tune the interval as you wish:
df = pd.DataFrame({"From": ["2017-12-25", "2018-02-12", "2018-03-29"],"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From)
df.To = pd.to_datetime(df.To)
holidays = []
for ix,row in df.iterrows():
holidays += pd.date_range(row.From,row.To).tolist()
all_dates = pd.DataFrame({'dates':pd.date_range(df.From.min(),df.To.max())})
all_dates['is_holiday'] = False
all_dates.loc[all_dates.dates.isin(holidays),'is_holiday'] = True
EDIT, cleaner code:
holidays = []
def holidays(x):
return pd.date_range(x.From,x.To).tolist()
holidays = df.apply(lambda x:holidays(x), axis=1).sum()
all_dates = pd.DataFrame({'dates':pd.date_range(df.From.min(),df.To.max())})
all_dates['is_holiday'] = False
all_dates.loc[all_dates.dates.isin(holidays),'is_holiday'] = True
This is the smallest solution i came up with in the end. It is based on @imburningbabe first solution. Many thanks for the inspiration! I wouldn’t have been able to do it without your answer
df = pd.DataFrame({"From": ["2017-12-25", "2018-02-12", "2018-03-29"],"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From); df.To = pd.to_datetime(df.To)
all_dates = pd.DataFrame(index=pd.date_range(df.From.min(),df.To.max()))
all_dates['is_holiday'] = False
for (from_, to) in df.itertuples(index=False):
all_dates.loc[from_:to, 'is_holiday'] = True
I have a DataFrame with school holidays. They have a "from" and "to" date column. Can you provide me with a neat and short way to convert it to a "is_holiday" Series for every day?
I have:
idx | From | To | Name |
---|---|---|---|
0 | 2017-12-25 | 2018-01-05 | Xmas holiday |
1 | 2018-02-12 | 2018-02-23 | Sport holidy |
2 | 2018-03-29 | 2018-04-02 | Easter holiday |
…
I want:
Date | is_holiday |
---|---|
.. | |
2017-12-24 | False |
2017-12-25 | True |
2017-12-26 | True |
.. | |
2018-01-04 | False |
2018-01-05 | True |
.. |
and so on..
…
Example DataFrame for your convenience:
import pandas as pd
df = pd.DataFrame({
"From": ["2017-12-25", "2018-02-12", "2018-03-29"],
"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From)
df.To = pd.to_datetime(df.To)
This range all dates from the lowest From
to the highest To
, but you can tune the interval as you wish:
df = pd.DataFrame({"From": ["2017-12-25", "2018-02-12", "2018-03-29"],"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From)
df.To = pd.to_datetime(df.To)
holidays = []
for ix,row in df.iterrows():
holidays += pd.date_range(row.From,row.To).tolist()
all_dates = pd.DataFrame({'dates':pd.date_range(df.From.min(),df.To.max())})
all_dates['is_holiday'] = False
all_dates.loc[all_dates.dates.isin(holidays),'is_holiday'] = True
EDIT, cleaner code:
holidays = []
def holidays(x):
return pd.date_range(x.From,x.To).tolist()
holidays = df.apply(lambda x:holidays(x), axis=1).sum()
all_dates = pd.DataFrame({'dates':pd.date_range(df.From.min(),df.To.max())})
all_dates['is_holiday'] = False
all_dates.loc[all_dates.dates.isin(holidays),'is_holiday'] = True
This is the smallest solution i came up with in the end. It is based on @imburningbabe first solution. Many thanks for the inspiration! I wouldn’t have been able to do it without your answer
df = pd.DataFrame({"From": ["2017-12-25", "2018-02-12", "2018-03-29"],"To": ["2018-01-05","2018-02-23","2018-04-02"],
})
df.From = pd.to_datetime(df.From); df.To = pd.to_datetime(df.To)
all_dates = pd.DataFrame(index=pd.date_range(df.From.min(),df.To.max()))
all_dates['is_holiday'] = False
for (from_, to) in df.itertuples(index=False):
all_dates.loc[from_:to, 'is_holiday'] = True