How to get start and end datetime indices of groups of consecutive values of data in pandas including repeated valus?

Question

There are many answers based on numerical indices but I am looking for a solution that works with a DateTimeIndex and got really stuck here. The closest answer I found with a numerical index is this one but does not work for my example.

I want to get the group start and end as DateTime for groups of n consecutive values in a DataFrame column.

Sample data:

import pandas as pd


index = pd.date_range(
    start=pd.Timestamp("2023-03-20 12:00:00+0000", tz="UTC"),
    end=pd.Timestamp("2023-03-20 15:00:00+0000", tz="UTC"),
    freq="15Min",
)
data = {
    "values_including_constant_groups": [
        2.0,
        1.0,
        1.0,
        3.0,
        3.0,
        3.0,
        4.0,
        4.0,
        4.0,
        2.0,
        3.0,
        3.0,
        1.0,
    ],
}
df = pd.DataFrame(
    index=index,
    data=data,
)

print(df)

yields:

                        values_including_constant_groups
2023-03-20 12:00:00+00:00                               2.0
2023-03-20 12:15:00+00:00                               1.0
2023-03-20 12:30:00+00:00                               1.0
2023-03-20 12:45:00+00:00                               3.0
2023-03-20 13:00:00+00:00                               3.0
2023-03-20 13:15:00+00:00                               3.0
2023-03-20 13:30:00+00:00                               4.0
2023-03-20 13:45:00+00:00                               4.0
2023-03-20 14:00:00+00:00                               4.0
2023-03-20 14:15:00+00:00                               2.0
2023-03-20 14:30:00+00:00                               3.0
2023-03-20 14:45:00+00:00                               3.0
2023-03-20 15:00:00+00:00                               1.0

Desired output (I would be flexible here but this would be my first idea):

                        values_including_constant_groups   group_start      group_end
2023-03-20 12:00:00+00:00                               2.0   NaN              NaN
2023-03-20 12:15:00+00:00                               1.0   True             False
2023-03-20 12:30:00+00:00                               1.0   False            True
2023-03-20 12:45:00+00:00                               3.0   True             False
2023-03-20 13:00:00+00:00                               3.0   False            False
2023-03-20 13:15:00+00:00                               3.0   False            True
2023-03-20 13:30:00+00:00                               4.0   True             False
2023-03-20 13:45:00+00:00                               4.0   False            False
2023-03-20 14:00:00+00:00                               4.0   False            True
2023-03-20 14:15:00+00:00                               2.0   NaN              NaN
2023-03-20 14:30:00+00:00                               3.0   True             False
2023-03-20 14:45:00+00:00                               3.0   False            True
2023-03-20 15:00:00+00:00                               1.0   NaN              NaN

So only groups of n>=2 should be considered here and "single" values excluded. Moreover, repeated groups should be included.

Any hints are very welcome!

Asked By: Cord Kaldemeyer

||

Source

Answer 1

Code

c = 'values_including_constant_groups'

# Compare current with previous and previous with current row
# to flag the rows corresponding to group start and group end
s, e = df[c] != df[c].shift(), df[c] != df[c].shift(-1)

# mask the flags where both group_start and group_end
# is True on the same row, i.e where n == 1
df['group_start'], df['group_end'] = s.mask(s & e), e.mask(s & e)

Result

                           values_including_constant_groups group_start group_end
2023-03-20 12:00:00+00:00                               2.0         NaN       NaN
2023-03-20 12:15:00+00:00                               1.0        True     False
2023-03-20 12:30:00+00:00                               1.0       False      True
2023-03-20 12:45:00+00:00                               3.0        True     False
2023-03-20 13:00:00+00:00                               3.0       False     False
2023-03-20 13:15:00+00:00                               3.0       False      True
2023-03-20 13:30:00+00:00                               4.0        True     False
2023-03-20 13:45:00+00:00                               4.0       False     False
2023-03-20 14:00:00+00:00                               4.0       False      True
2023-03-20 14:15:00+00:00                               2.0         NaN       NaN
2023-03-20 14:30:00+00:00                               3.0        True     False
2023-03-20 14:45:00+00:00                               3.0       False      True
2023-03-20 15:00:00+00:00                               1.0         NaN       NaN

Answered By: Shubham Sharma

How to get start and end datetime indices of groups of consecutive values of data in pandas including repeated valus?

Question:

Answers:

Code