Pandas Dataframe count occurrences that only happen immediately

Question:

I have the following data frame ‘A’

Index 1or0
1 0
2 0
3 0
8 0
9 1
10 1

I want to count how many times the zero (or 1) occurs in directly afterwards in the index column and write that into a new dataframe ‘B’ below:

StartNum EndNum Size
1 3 3
8 8 1
9 10 2

What is the fastest or best way to do this? just iterate like I would do with an array or is there a better way using pandas?

Asked By: Natoshi SakiSaki

||

Answers:

IIUC, use this :

# is there any 0<>1 transition? if so, then cumsum!
ser = A["1or0"].ne(A["1or0"].shift().bfill()).cumsum()
​
B = (
        A.groupby(ser, as_index=False)
            .agg({"Index": ["first", "last", "count"]})
            .set_axis(["StartNum", "EndNum", "Size"], axis=1)
    )


Output:

print(B)

   StartNum  EndNum  Size
0         1       3     3
1         4       7     4
2         8       8     1
3         9      10     2

Update (based on comments) :

B = (
        A.groupby(ser, as_index=False)
            .agg({"Index": ["first", "last", "count"],
                  "1or0": "unique"})
            .set_axis(["StartNum", "EndNum", "Size", "Value"], axis=1)
            .assign(Value= lambda d: d["Value"].astype(str).str.strip("[]"))
    )

print(B)
​
   StartNum  EndNum  Size Value
0         1       3     3     0
1         4       7     4     1
2         8       8     1     0
3         9      10     2     1

DataFrame used :

print(A)

   Index  1or0
0      1     0
1      2     0
2      3     0
3      4     1
4      5     1
5      6     1
6      7     1
7      8     0
8      9     1
9     10     1
Answered By: Timeless
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.