Pandas – Count consecutive rows greater than zero on the group level

Question:

I would like to translate this R code into Python’s pandas:

gg <- c('a', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b')
val <- c(-2, 2, -1, 2, 3, 3, -7, 5, 6, 2, 8)
df = data.table(gg=gg, val=val)
df[, gg_count := seq(.N), by = gg]
df[, counter := gg_count - na.locf(ifelse(val <= 0, gg_count, NA), na.rm=F), by=gg]
print(df)

Output:

  gg val gg_count counter???
   a  -2        1       0
   a   2        2       1
   b  -1        1       0
   a   2        3       2
   b   3        2       1
   b   3        3       2
   a  -7        4       0
   a   5        5       1
   a   6        6       2
   a   2        7       3
   b   8        4       3

Basically, I have columns "gg" and "val" and need to get to the column "counter". In R, I used also helper column "gg_count".

"counter" column, as you can see in the example, counts how many consecutive rows have "val" greater than zero. When negative number pops up, we reset the counter and start counting again. Everything needs to be done on a group level using column "gg".

Asked By: Ivan R

||

Answers:

You can use:

g = df['val'].le(0).groupby(df['gg']).cumsum() # get groups between negative numbers
df['val'].gt(0).groupby([df['gg'], g]).cumsum()

Output:

0     0
1     1
2     0
3     2
4     1
5     2
6     0
7     1
8     2
9     3
10    3
Name: val, dtype: int64
Answered By: Mykola Zotko
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.