Pandas: Group by and count days before and after specific date

Question:

Dataframe to start with:

df = pd.DataFrame([
    {'date': date(2023, 1, 1), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 2), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 5), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 6), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 20), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 28), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 29), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 2, 1), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 2), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 5), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 6), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 20), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 27), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 28), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 1, 1), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 2), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 5), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 6), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 20), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 28), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 1, 29), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
    {'date': date(2023, 2, 1), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 2), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 5), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 6), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 20), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 27), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
    {'date': date(2023, 2, 28), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
])

I want to add two additional columns:

  • days_after – counts days after "third_friday" and until next third friday (grouped by "name" column)
  • days_before – counts days before "third_friday" (grouped by "name" column)

Expected output:

          date name third_friday  days_after  days_before
0   2023-01-01   AA   2023-01-20           1            4
1   2023-01-02   AA   2023-01-20           2            3
2   2023-01-05   AA   2023-01-20           3            2
3   2023-01-06   AA   2023-01-20           4            1
4   2023-01-20   AA   2023-01-20           0            0
5   2023-01-28   AA   2023-01-20           1            6
6   2023-01-29   AA   2023-01-20           2            5
7   2023-02-01   AA   2023-02-17           3            4
8   2023-02-02   AA   2023-02-17           4            3
9   2023-02-05   AA   2023-02-17           5            2
10  2023-02-06   AA   2023-02-17           6            1
11  2023-02-20   AA   2023-02-17           1            3
12  2023-02-27   AA   2023-02-17           2            2
13  2023-02-28   AA   2023-02-17           3            1
14  2023-01-01   BB   2023-01-20           1            4
15  2023-01-02   BB   2023-01-20           2            3
16  2023-01-05   BB   2023-01-20           3            2
17  2023-01-06   BB   2023-01-20           4            1
18  2023-01-20   BB   2023-01-20           0            0
19  2023-01-28   BB   2023-01-20           1            6
20  2023-01-29   BB   2023-01-20           2            5
21  2023-02-01   BB   2023-02-17           3            4
22  2023-02-02   BB   2023-02-17           4            3
23  2023-02-05   BB   2023-02-17           5            2
24  2023-02-06   BB   2023-02-17           6            1
25  2023-02-20   BB   2023-02-17           1            3
26  2023-02-27   BB   2023-02-17           2            2
27  2023-02-28   BB   2023-02-17           3            1
Asked By: Ivan R

||

Answers:

You can use:

# get difference between 2 columns
s = df['date'].sub(df['third_friday'])
# is it > 0?
s2 = s.ge('0')
# is it not 0?
m = s.ne('0')
# form groups starting on each first True of s2
group = (s2&~s2.shift(fill_value=False)).cumsum()

# set up grouper
g = df[m].groupby(['name', group])

# up and down count per group
df['days_after'] = g.cumcount().add(1).reindex(df.index, fill_value=0)
df['days_before'] = g.cumcount(ascending=False).add(1).reindex(df.index, fill_value=0)

Output:

         date name third_friday  days_after  days_before
0  2023-01-01   AA   2023-01-20           1            4
1  2023-01-02   AA   2023-01-20           2            3
2  2023-01-05   AA   2023-01-20           3            2
3  2023-01-06   AA   2023-01-20           4            1
4  2023-01-20   AA   2023-01-20           0            0
5  2023-01-28   AA   2023-01-20           1            6
6  2023-01-29   AA   2023-01-20           2            5
7  2023-02-01   AA   2023-02-17           3            4
8  2023-02-02   AA   2023-02-17           4            3
9  2023-02-05   AA   2023-02-17           5            2
10 2023-02-06   AA   2023-02-17           6            1
11 2023-02-20   AA   2023-02-17           1            3
12 2023-02-27   AA   2023-02-17           2            2
13 2023-02-28   AA   2023-02-17           3            1
14 2023-01-01   BB   2023-01-20           1            4
15 2023-01-02   BB   2023-01-20           2            3
16 2023-01-05   BB   2023-01-20           3            2
17 2023-01-06   BB   2023-01-20           4            1
18 2023-01-20   BB   2023-01-20           0            0
19 2023-01-28   BB   2023-01-20           1            6
20 2023-01-29   BB   2023-01-20           2            5
21 2023-02-01   BB   2023-02-17           3            4
22 2023-02-02   BB   2023-02-17           4            3
23 2023-02-05   BB   2023-02-17           5            2
24 2023-02-06   BB   2023-02-17           6            1
25 2023-02-20   BB   2023-02-17           1            3
26 2023-02-27   BB   2023-02-17           2            2
27 2023-02-28   BB   2023-02-17           3            1
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.