Find how many consecutive days have a specific value in pandas

Question:

I have the following pandas dataframe:

Date           Value
2019-01-01       0
2019-01-02       0
2019-01-03       0
2019-01-04       0
2019-01-05       1
2019-01-06       1
2019-01-10       1
2019-01-11       0
2019-01-12       0
2019-01-13       0
2019-01-14       0

I would like to have a start date and end date of each group of consecutive days that have value equal to 0 and obtain something like this:

Start Date  End Date.    N Days 
2019-01-01  2019-01-04    4
2019-01-11  2019-01-14    4
Asked By: Marco

||

Answers:

Creat the subgroup with cumsum , then groupby with agg

s = df.Value.ne(0).cumsum()
out = df[df.Value.eq(0)].groupby(s).Date.agg(['first','last','count'])
out
Out[295]: 
            first        last  count
Value                               
0      2019-01-01  2019-01-04      4
3      2019-01-11  2019-01-14      4

Update

s = (df.Value.ne(0) | df.Date.diff().dt.days.ne(1)).cumsum()
out = df[df.Value.eq(0)].groupby(s).Date.agg(['first','last','count'])
out
Out[306]: 
       first       last  count
1 2019-01-01 2019-01-04      4
4 2019-01-11 2019-01-14      4
5 2020-01-01 2020-01-01      1

Input data

         Date  Value
0  2019-01-01      0
1  2019-01-02      0
2  2019-01-03      0
3  2019-01-04      0
4  2019-01-05      1
5  2019-01-06      1
6  2019-01-10      1
7  2019-01-11      0
8  2019-01-12      0
9  2019-01-13      0
10 2019-01-14      0
11 2020-01-01      0
Answered By: BENY

"[BENY’s answer] works partially, because it doesn’t take into account the actual date. In fact, if I add another line ‘2019-01-17’ with value 0 at the end, the count of the second group become 5, but that is not correct because there are some days missing in between ‘2019-01-14’ and ‘2019-01-17’."

This can be solved as follows:

t = df[df['Value'] == 0]['Date'].diff.dt.days
t = t.fillna(1.0)
t = t.ne(1.0).cumsum()
result = df.groupby(t)['Date'].agg(['first', 'last', 'count']
Answered By: Dominic Istha
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.