Count consecutive boolean values in Python/pandas array for whole subset

Question:

I am looking for a way to aggregate pandas data frame by consecutive same values and perform actions like count or max on this aggregation.

for example, if I would have one column in df:

    my_column
0        0  
1        0  
2        1  
3        1  
4        1  
5        0  
6        0  
7        0  
8        0  
9        1  
10       1  
11       0

the result needs to be:

    result
0        2  
1        2  
2        3  
3        3  
4        3  
5        4  
6        4  
7        4  
8        4  
9        2  
10       2  
11       1

Why: We have two 0 at the beginning, and three 1 next,…

What I need, is similar that this answer but for all elements in the group I need the same value.

The preferred answer would be one that shows this aggregation of the consecutive same element and applies the aggregation function to it. So that I could do even max value:

    my_column    other_value
0        0           7
1        0           4
2        1           1
3        1           0
4        1           5
5        0           1
6        0           1
7        0           2
8        0           8
9        1           1
10       1           0
11       0           2

and the result would be

    result
0        7  
1        7  
2        5  
3        5  
4        5  
5        8  
6        8  
7        8  
8        8  
9        1  
10       1  
11       2
Asked By: Marko Zadravec

||

Answers:

You can use :

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

out = df.groupby(g)["my_column"].transform("count")

Output :

print(out)
​
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

NB : to get the max, use df.groupby(g)["other_value"].transform("max").

Answered By: Timeless

If check linked answer there is exactly way for groups by consecutive values:

(y != y.shift()).cumsum()

So if create consecutive groups per column my_column output is:

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

print (g)
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8     3
9     4
10    4
11    5
Name: my_column, dtype: int32

is possible use GroupBy.transform with Series.to_frame for one column DataFrame:

df1 = df.groupby(g)['my_column'].transform('size').to_frame()
print (df1)
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

Or Series.map with Series.value_counts:

df1 = g.map(g.value_counts()).to_frame()
print (df1)
    my_column
0           2
1           2
2           3
3           3
4           3
5           4
6           4
7           4
8           4
9           2
10          2
11          1

Similar way for second solution:

g = df["my_column"].ne(df["my_column"].shift()).cumsum()

df1 = df.groupby(g)['other_value'].transform('max').to_frame(name='result')
print (df1)
    result
0        7
1        7
2        5
3        5
4        5
5        8
6        8
7        8
8        8
9        1
10       1
11       2
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.