Pandas Dataframe count occurrences that only happen immediately
Question:
I have the following data frame ‘A’
Index
1or0
1
0
2
0
3
0
…
…
8
0
9
1
10
1
…
…
I want to count how many times the zero (or 1) occurs in directly afterwards in the index column and write that into a new dataframe ‘B’ below:
StartNum
EndNum
Size
1
3
3
8
8
1
9
10
2
What is the fastest or best way to do this? just iterate like I would do with an array or is there a better way using pandas?
Answers:
IIUC, use this :
# is there any 0<>1 transition? if so, then cumsum!
ser = A["1or0"].ne(A["1or0"].shift().bfill()).cumsum()
B = (
A.groupby(ser, as_index=False)
.agg({"Index": ["first", "last", "count"]})
.set_axis(["StartNum", "EndNum", "Size"], axis=1)
)
Output:
print(B)
StartNum EndNum Size
0 1 3 3
1 4 7 4
2 8 8 1
3 9 10 2
Update (based on comments) :
B = (
A.groupby(ser, as_index=False)
.agg({"Index": ["first", "last", "count"],
"1or0": "unique"})
.set_axis(["StartNum", "EndNum", "Size", "Value"], axis=1)
.assign(Value= lambda d: d["Value"].astype(str).str.strip("[]"))
)
print(B)
StartNum EndNum Size Value
0 1 3 3 0
1 4 7 4 1
2 8 8 1 0
3 9 10 2 1
DataFrame used :
print(A)
Index 1or0
0 1 0
1 2 0
2 3 0
3 4 1
4 5 1
5 6 1
6 7 1
7 8 0
8 9 1
9 10 1
I have the following data frame ‘A’
Index | 1or0 |
---|---|
1 | 0 |
2 | 0 |
3 | 0 |
… | … |
8 | 0 |
9 | 1 |
10 | 1 |
… | … |
I want to count how many times the zero (or 1) occurs in directly afterwards in the index column and write that into a new dataframe ‘B’ below:
StartNum | EndNum | Size |
---|---|---|
1 | 3 | 3 |
8 | 8 | 1 |
9 | 10 | 2 |
What is the fastest or best way to do this? just iterate like I would do with an array or is there a better way using pandas?
IIUC, use this :
# is there any 0<>1 transition? if so, then cumsum!
ser = A["1or0"].ne(A["1or0"].shift().bfill()).cumsum()
B = (
A.groupby(ser, as_index=False)
.agg({"Index": ["first", "last", "count"]})
.set_axis(["StartNum", "EndNum", "Size"], axis=1)
)
Output:
print(B)
StartNum EndNum Size
0 1 3 3
1 4 7 4
2 8 8 1
3 9 10 2
Update (based on comments) :
B = (
A.groupby(ser, as_index=False)
.agg({"Index": ["first", "last", "count"],
"1or0": "unique"})
.set_axis(["StartNum", "EndNum", "Size", "Value"], axis=1)
.assign(Value= lambda d: d["Value"].astype(str).str.strip("[]"))
)
print(B)
StartNum EndNum Size Value
0 1 3 3 0
1 4 7 4 1
2 8 8 1 0
3 9 10 2 1
DataFrame used :
print(A)
Index 1or0
0 1 0
1 2 0
2 3 0
3 4 1
4 5 1
5 6 1
6 7 1
7 8 0
8 9 1
9 10 1