Find n consecutive business date where values are equal in Panadas data frames

Question:

We need to return the list of dates for n consecutive business days ( Friday to Monday is 1 business day ). where values are not changed. Do not assume that dates column have every single dates.
Data frame structure would be as given below

date        Value                
2022-07-19 44.43000000
2022-07-20 44.43000000
2022-07-21 44.43000000
2022-07-22 44.43000000
2022-07-25 44.43000000

... ...
2022-09-02  86.40000000
2022-09-06  85.13000000
2022-09-07  86.86000000
2022-09-08  88.44000000
2022-09-09  89.44000000

If we assume n is 5. We need to return list of 5 consecutive dates. For above examples answer would be

[2022-07-22,2022-07-20,2022-07-21,2022-07-22,2022-07-25]

I tried below code to get consecutive dates present in data frame but I am unable to get consecutive business days.

for k, v in px_dirty.groupby((px_dirty['value'].shift() != px_dirty['value']).cumsum()):

if len(v) == 5:
    print(f'[group {k}]')
    print(v)

I am not able to figure out how to get consecutive business days.

Asked By: Mithila Palkar

||

Answers:

First, create a field with business days that is the product of (weekday < 6) and (week number of the year). Then group and form lists with a length of at least 5 unique elements for each group

df = pd.DataFrame(pd.date_range('2022-01-01', '2022-03-01'))
df['BD'] = df[0].dt.weekday.lt(5).astype(int) * df[0].dt.isocalendar().week
df1 = df.groupby('BD').apply(lambda x: x[0].to_list() if len(set(x[0])) > 4 else None).dropna()
print(df1)

Prints:

BD
1    [2022-01-03 00:00:00, 2022-01-04 00:00:00, 202...
2    [2022-01-10 00:00:00, 2022-01-11 00:00:00, 202...
3    [2022-01-17 00:00:00, 2022-01-18 00:00:00, 202...
4    [2022-01-24 00:00:00, 2022-01-25 00:00:00, 202...
5    [2022-01-31 00:00:00, 2022-02-01 00:00:00, 202...
6    [2022-02-07 00:00:00, 2022-02-08 00:00:00, 202...
7    [2022-02-14 00:00:00, 2022-02-15 00:00:00, 202...
8    [2022-02-21 00:00:00, 2022-02-22 00:00:00, 202...
dtype: object
Answered By: Алексей Р

Use a date offset:

from pandas.tseries.offsets import BDay

df['date'] = pd.to_datetime(df['date'])

# identify breaks in successive values
m1 = df['Value'].ne(df['Value'].shift())
# identify breaks in business days
m2 = df['date'].ne(df['date'].shift().add(BDay()))

# group by either break
for k,g in df.groupby((m1|m2).cumsum()):
    if len(g) == 5:
        print(f'[group {k}]')
        print(g)

Output:

[group 1]
        date  Value
0 2022-07-19  44.43
1 2022-07-20  44.43
2 2022-07-21  44.43
3 2022-07-22  44.43
4 2022-07-25  44.43

Intermediates:

        date  Value     m1     m2  m1|m2  group  len
0 2022-07-19  44.43   True   True   True      1    5
1 2022-07-20  44.43  False  False  False      1    5
2 2022-07-21  44.43  False  False  False      1    5
3 2022-07-22  44.43  False  False  False      1    5
4 2022-07-25  44.43  False  False  False      1    5
5 2022-09-02  86.40   True   True   True      2    1
6 2022-09-06  85.13   True   True   True      3    1
7 2022-09-07  86.86   True  False   True      4    1
8 2022-09-08  88.44   True  False   True      5    1
9 2022-09-09  89.44   True  False   True      6    1
Answered By: mozway