Count consecutive row values but reset count with every 0 in row

Question:

Within a dataframe, I need to count and sum consecutive row values in column A into a new column, column B.

Starting with column A, the script would count the consecutive runs in 1s but when a 0 appears it prints the total count in column B, it then resets the count and continues through the remaining data.

Desired outcome:

A | B
0   0
1   0
1   0
1   0
1   0
0   4
0   0
1   0
1   0
0   2

I’ve tried using .shift() along with various if-statements but have been unsuccessful.

Asked By: Markus

||

Answers:

Here is one way to do it. However, I get the feeling that there might be better ways.. But you can try this for now:

  • The routine function is use to increment the counter variable until it encounters a value of 0 in the A column. At which point it grabs the total count, and then resets the counter variable.
  • I use a for-loop to iterate through the A column, and append the returned B values to a list
  • This list is then inserted into the dataframe.
df = pd.DataFrame({"A":[0,1,1,1,1,0,0,1,1,0]})

def routine(row, c):
    val = 0
    if row:
        c += 1
    else:
        val = c
        c = 0
    return(val, c)

B_vals = []
counter = 0
for item in df['A'].values:
    b, counter = routine(item, counter)
    B_vals.append(b)

df['B'] = B_vals
print(df)

OUTPUT:

   A  B
0  0  0
1  1  0
2  1  0
3  1  0
4  1  0
5  0  4
6  0  0
7  1  0
8  1  0
9  0  2
Answered By: ScottC

This could be a way to do it. Probably there exists a more elegant solution.

df['B'] = df['A'].groupby(df['A'].ne(df['A'].shift()).cumsum()).cumsum().shift(fill_value=0) * (df['A'].diff() == -1)

This part df['A'].groupby(df['A'].ne(df['A'].shift()) groups the data by consecutive occurences of values.
Then we take the cumsum which counts the cumulated sum along each group. Then we shift the results by 1 row because you want the count after the group. Then we mask out all the rows which are not the last row of the group + 1.

Answered By: sunnytown