Find how many consecutive days have a specific value in pandas
Question:
I have the following pandas dataframe:
Date Value
2019-01-01 0
2019-01-02 0
2019-01-03 0
2019-01-04 0
2019-01-05 1
2019-01-06 1
2019-01-10 1
2019-01-11 0
2019-01-12 0
2019-01-13 0
2019-01-14 0
I would like to have a start date and end date of each group of consecutive days that have value equal to 0 and obtain something like this:
Start Date End Date. N Days
2019-01-01 2019-01-04 4
2019-01-11 2019-01-14 4
Answers:
Creat the subgroup with cumsum
, then groupby
with agg
s = df.Value.ne(0).cumsum()
out = df[df.Value.eq(0)].groupby(s).Date.agg(['first','last','count'])
out
Out[295]:
first last count
Value
0 2019-01-01 2019-01-04 4
3 2019-01-11 2019-01-14 4
Update
s = (df.Value.ne(0) | df.Date.diff().dt.days.ne(1)).cumsum()
out = df[df.Value.eq(0)].groupby(s).Date.agg(['first','last','count'])
out
Out[306]:
first last count
1 2019-01-01 2019-01-04 4
4 2019-01-11 2019-01-14 4
5 2020-01-01 2020-01-01 1
Input data
Date Value
0 2019-01-01 0
1 2019-01-02 0
2 2019-01-03 0
3 2019-01-04 0
4 2019-01-05 1
5 2019-01-06 1
6 2019-01-10 1
7 2019-01-11 0
8 2019-01-12 0
9 2019-01-13 0
10 2019-01-14 0
11 2020-01-01 0
"[BENY’s answer] works partially, because it doesn’t take into account the actual date. In fact, if I add another line ‘2019-01-17’ with value 0 at the end, the count of the second group become 5, but that is not correct because there are some days missing in between ‘2019-01-14’ and ‘2019-01-17’."
This can be solved as follows:
t = df[df['Value'] == 0]['Date'].diff.dt.days
t = t.fillna(1.0)
t = t.ne(1.0).cumsum()
result = df.groupby(t)['Date'].agg(['first', 'last', 'count']
I have the following pandas dataframe:
Date Value
2019-01-01 0
2019-01-02 0
2019-01-03 0
2019-01-04 0
2019-01-05 1
2019-01-06 1
2019-01-10 1
2019-01-11 0
2019-01-12 0
2019-01-13 0
2019-01-14 0
I would like to have a start date and end date of each group of consecutive days that have value equal to 0 and obtain something like this:
Start Date End Date. N Days
2019-01-01 2019-01-04 4
2019-01-11 2019-01-14 4
Creat the subgroup with cumsum
, then groupby
with agg
s = df.Value.ne(0).cumsum()
out = df[df.Value.eq(0)].groupby(s).Date.agg(['first','last','count'])
out
Out[295]:
first last count
Value
0 2019-01-01 2019-01-04 4
3 2019-01-11 2019-01-14 4
Update
s = (df.Value.ne(0) | df.Date.diff().dt.days.ne(1)).cumsum()
out = df[df.Value.eq(0)].groupby(s).Date.agg(['first','last','count'])
out
Out[306]:
first last count
1 2019-01-01 2019-01-04 4
4 2019-01-11 2019-01-14 4
5 2020-01-01 2020-01-01 1
Input data
Date Value
0 2019-01-01 0
1 2019-01-02 0
2 2019-01-03 0
3 2019-01-04 0
4 2019-01-05 1
5 2019-01-06 1
6 2019-01-10 1
7 2019-01-11 0
8 2019-01-12 0
9 2019-01-13 0
10 2019-01-14 0
11 2020-01-01 0
"[BENY’s answer] works partially, because it doesn’t take into account the actual date. In fact, if I add another line ‘2019-01-17’ with value 0 at the end, the count of the second group become 5, but that is not correct because there are some days missing in between ‘2019-01-14’ and ‘2019-01-17’."
This can be solved as follows:
t = df[df['Value'] == 0]['Date'].diff.dt.days
t = t.fillna(1.0)
t = t.ne(1.0).cumsum()
result = df.groupby(t)['Date'].agg(['first', 'last', 'count']