How to change the value of a Series element if preceded by N or more consecutive values to the value preceding it?

Question:

Consider the following Series s:

0    0
1    0
2    0
3    1
4    1
5    1
6    0

For N = 3, we can see that the items at indices 3 and 6 are both preceded by >= N same value occurrences. Hence, their value should change the value of the items preceding them!

Output:

0    0
1    0
2    0
3    0
4    1
5    1
6    1

So, far I have come up with this:

(s != s.shift()).cumsum()

It somehow assigns a group id to consecutive occurrences of a value, but I am not sure in which way I should proceed.

Asked By: ttsak

||

Answers:

IIUC, you can use a rolling.apply with nunique:

N = 3
# check if an item is identical to the N-1 previous ones
# then shift (i.e. is one item preceded by N identical items?)
m = s.rolling(N).apply(lambda x: x.nunique()).shift().eq(1)
# mask those values and replace by the previous
out = s.mask(m).ffill(downcast='infer')

Intermediates:

   s  rolling_nunique  shift  eq(1)  masked  ffill
0  0              NaN    NaN  False     0.0      0
1  0              NaN    NaN  False     0.0      0
2  0              1.0    NaN  False     0.0      0
3  1              2.0    1.0   True     NaN      0
4  1              2.0    2.0  False     1.0      1
5  1              1.0    2.0  False     1.0      1
6  0              2.0    1.0   True     NaN      1

Same logic with ‘s sliding_window_view:

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as swv

N = 3
# get "rolling" values as 2D view
v = swv(np.r_[np.full(N-1, np.nan), s], N)
# are all N values equal?
m = np.r_[False, (v == v[:, [0]]).all(1)[:-1]]

out = s.mask(m).ffill(downcast='infer')

Output:

0    0
1    0
2    0
3    0
4    1
5    1
6    1
dtype: int64
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.