Pandas: Group by and count days before and after specific date
Question:
Dataframe to start with:
df = pd.DataFrame([
{'date': date(2023, 1, 1), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 1, 1), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
])
I want to add two additional columns:
- days_after – counts days after "third_friday" and until next third friday (grouped by "name" column)
- days_before – counts days before "third_friday" (grouped by "name" column)
Expected output:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1
Answers:
You can use:
# get difference between 2 columns
s = df['date'].sub(df['third_friday'])
# is it > 0?
s2 = s.ge('0')
# is it not 0?
m = s.ne('0')
# form groups starting on each first True of s2
group = (s2&~s2.shift(fill_value=False)).cumsum()
# set up grouper
g = df[m].groupby(['name', group])
# up and down count per group
df['days_after'] = g.cumcount().add(1).reindex(df.index, fill_value=0)
df['days_before'] = g.cumcount(ascending=False).add(1).reindex(df.index, fill_value=0)
Output:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1
Dataframe to start with:
df = pd.DataFrame([
{'date': date(2023, 1, 1), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'AA', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'AA', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 1, 1), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 2), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 5), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 6), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 20), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 28), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 1, 29), 'name': 'BB', 'third_friday': date(2023, 1, 20)},
{'date': date(2023, 2, 1), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 2), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 5), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 6), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 20), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 27), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
{'date': date(2023, 2, 28), 'name': 'BB', 'third_friday': date(2023, 2, 17)},
])
I want to add two additional columns:
- days_after – counts days after "third_friday" and until next third friday (grouped by "name" column)
- days_before – counts days before "third_friday" (grouped by "name" column)
Expected output:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1
You can use:
# get difference between 2 columns
s = df['date'].sub(df['third_friday'])
# is it > 0?
s2 = s.ge('0')
# is it not 0?
m = s.ne('0')
# form groups starting on each first True of s2
group = (s2&~s2.shift(fill_value=False)).cumsum()
# set up grouper
g = df[m].groupby(['name', group])
# up and down count per group
df['days_after'] = g.cumcount().add(1).reindex(df.index, fill_value=0)
df['days_before'] = g.cumcount(ascending=False).add(1).reindex(df.index, fill_value=0)
Output:
date name third_friday days_after days_before
0 2023-01-01 AA 2023-01-20 1 4
1 2023-01-02 AA 2023-01-20 2 3
2 2023-01-05 AA 2023-01-20 3 2
3 2023-01-06 AA 2023-01-20 4 1
4 2023-01-20 AA 2023-01-20 0 0
5 2023-01-28 AA 2023-01-20 1 6
6 2023-01-29 AA 2023-01-20 2 5
7 2023-02-01 AA 2023-02-17 3 4
8 2023-02-02 AA 2023-02-17 4 3
9 2023-02-05 AA 2023-02-17 5 2
10 2023-02-06 AA 2023-02-17 6 1
11 2023-02-20 AA 2023-02-17 1 3
12 2023-02-27 AA 2023-02-17 2 2
13 2023-02-28 AA 2023-02-17 3 1
14 2023-01-01 BB 2023-01-20 1 4
15 2023-01-02 BB 2023-01-20 2 3
16 2023-01-05 BB 2023-01-20 3 2
17 2023-01-06 BB 2023-01-20 4 1
18 2023-01-20 BB 2023-01-20 0 0
19 2023-01-28 BB 2023-01-20 1 6
20 2023-01-29 BB 2023-01-20 2 5
21 2023-02-01 BB 2023-02-17 3 4
22 2023-02-02 BB 2023-02-17 4 3
23 2023-02-05 BB 2023-02-17 5 2
24 2023-02-06 BB 2023-02-17 6 1
25 2023-02-20 BB 2023-02-17 1 3
26 2023-02-27 BB 2023-02-17 2 2
27 2023-02-28 BB 2023-02-17 3 1