Count consecutive boolean values in Python/pandas array for whole subset

Question

I am looking for a way to aggregate pandas data frame by consecutive same values and perform actions like count or max on this aggregation.

for example, if I would have one column in df:

the result needs to be:

Why: We have two 0 at the beginning, and three 1 next,…

What I need, is similar that this answer but for all elements in the group I need the same value.

The preferred answer would be one that shows this aggregation of the consecutive same element and applies the aggregation function to it. So that I could do even max value:

    my_column    other_value
0        0           7
1        0           4
2        1           1
3        1           0
4        1           5
5        0           1
6        0           1
7        0           2
8        0           8
9        1           1
10       1           0
11       0           2

and the result would be

Asked By: Marko Zadravec

||

Source

Answer 1

You can use :

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

out = df.groupby(g)["my_column"].transform("count")

Output :

print(out)

    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

NB : to get the max, use df.groupby(g)["other_value"].transform("max").

Answered By: Timeless

Answer 2

If check linked answer there is exactly way for groups by consecutive values:

(y != y.shift()).cumsum()

So if create consecutive groups per column my_column output is:

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

print (g)
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8     3
9     4
10    4
11    5
Name: my_column, dtype: int32

is possible use GroupBy.transform with Series.to_frame for one column DataFrame:

df1 = df.groupby(g)['my_column'].transform('size').to_frame()
print (df1)
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

Or Series.map with Series.value_counts:

df1 = g.map(g.value_counts()).to_frame()
print (df1)
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

Count consecutive boolean values in Python/pandas array for whole subset

Question:

Answers: