Adding column to a Pandas df checking whether date range ever falls on a given month in any year

Question:

We have a dataframe of entries where we want to know which entries have ever existed within a given month of any year. Simplified eg:

import pandas as pd
import datetime as dt

df = pd.DataFrame(
    {
         "start": [dt.datetime(2020,1,1), dt.datetime(2020,8,1), dt.datetime(2020,8,1)],
         "finish": [dt.datetime(2021,12,1), dt.datetime(2021,6,1), dt.datetime(2022,6,1)],
     })

How can we add a column determining which entries ever existed on any July of any year? We can add this if we’re only concerned for July 2020: df['existed_in_july_2020'] = (df['start'] < dt.datetime(2020,7,1)) & (df['finish'] >= dt.datetime(2020,8,1)), but this doesn’t have other years, and the third entry existed in July 2021.

In this eg df that column existed_in_july would be:

df = pd.DataFrame(
    {
         "start": [dt.datetime(2020,1,1), dt.datetime(2020,8,1), dt.datetime(2020,8,1)],
         "finish": [dt.datetime(2021,12,1), dt.datetime(2021,6,1), dt.datetime(2022,6,1)],
         "existed_in_july": [True, False, True]
     })

How can we create this column?

Answers:

One option that should work would be to check if either July of the start or finish year is in between the two dates, or if more than one year elapsed between the two:

m1 = df['start'].add(pd.DateOffset(month=7)).between(df['start'], df['finish'])

m2 = df['finish'].add(pd.DateOffset(month=7)).between(df['start'], df['finish'])

m3 = df['finish'].sub(df['start']).gt('1Y')

df['existed_in_july'] = m1|m2|m3

Output:

       start     finish  existed_in_july
0 2020-01-01 2021-12-01             True
1 2020-08-01 2021-06-01            False
2 2020-08-01 2022-06-01             True
Answered By: mozway

You can use month periods with test july month in list comprehension:

df['existed_in_july'] = [(pd.period_range(a, b, freq='m').month == 7).any()  
                         for a, b in zip(df['start'], df['finish'])]
print (df)
 
        start     finish  existed_in_july
0 2020-01-01 2021-12-01             True
1 2020-08-01 2021-06-01            False
2 2020-08-01 2022-06-01             True
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.