Calculating length of sequence of zeros in Pandas

Question

I have a table like this

Unit	status	date
One	1	1
One	1	2
One	1	3
One	0	4
One	0	5
One	1	6
One	1	7

and I want to create a new column where I’d have the size of the sequence of zeros from the status column. So for that example, the output would be

Unit	status	date	gap
One	1	1	0
One	1	2	0
One	1	3	0
One	0	4	2
One	0	5	2
One	1	6	0
One	1	7	0

This would have to be done for all the units in the DataFrame. I was basing myself on this question, but I’m stuck in the part where I set the total size for all the rows that are part of the gap

Asked By: Eduardo Pacheco

||

Source

Answer 1

The usual way to group the block of some values is to cumsum on the other values. Given that your data is sorted by Unit:

df['gap'] = (df.groupby(['Unit', 'status', df['status'].cumsum()])
             ['status'].transform('size')
             .where(df['status'].eq(0), other=0)
            )

Output:

  Unit  status  date  gap
0  One       1     1    0
1  One       1     2    0
2  One       1     3    0
3  One       0     4    2
4  One       0     5    2
5  One       1     6    0
6  One       1     7    0

Answered By: Quang Hoang

Answer 2

Another approach could be to use run-length encoding via package python-rle:

import rle 

r = rle.encode(df.status)

df['gap'] = (rle
  .decode([r[1][x] if r[0][x] == 0 else 0 for x in range(len(r[0]))], r[1]))

Output:

 Unit  status  date  gap
0  One       1     1    0
1  One       1     2    0
2  One       1     3    0
3  One       0     4    2
4  One       0     5    2
5  One       1     6    0
6  One       1     7    0

Answered By: PaulS

Calculating length of sequence of zeros in Pandas

Question:

Answers: