Retrieve number of rows between the current row and the last/next positive value

Question:

I have the following dataframe:

   feature
0        1
1        0
2        0
3        0
4        0
5        1
6        0
7        1

I would like to create a 2 columns that will include the number of rows between the current row and the last and next positive value.
The output dataframe should be something like this:

   feature    previous_feat        next_feat
0        1               NA                5
1        0                1                4
2        0                2                3
3        0                3                2
4        0                4                1
5        1                5                2
6        0                1                1
7        1                2               NA

I already tried things around the combination of shift and mask methods but I don’t manage to make it works.
Note that it could be the number of row or the index difference it doesn’t really matter for me. Same thing for the NA values, it could be NA or 0.

import pandas as pd

df = pd.DataFrame({"feature": [1, 0, 0, 0, 0, 1, 0, 1]})

# df["previous_feat"] = df.shift().mask(df["feature"] != 0)
Asked By: imM4TT

||

Answers:

You can use groupby.cumcount and boolean masks:

# form groups
g1 = df.loc[::-1, 'feature'].eq(1).cumsum()
g2 = df['feature'].eq(1).cumsum()

# mask first/last
m1 = g2.eq(1) & df['feature'].eq(1)
m2 = g1.eq(1) & df['feature'].eq(1)

# compute cumcount
df['previous_feat'] = df.groupby(g1).cumcount().add(1).mask(m1)
df['next_feat'] = df[::-1].groupby(g2).cumcount().add(1).mask(m2)

Output:

   feature  previous_feat  next_feat
0        1            NaN        5.0
1        0            1.0        4.0
2        0            2.0        3.0
3        0            3.0        2.0
4        0            4.0        1.0
5        1            5.0        2.0
6        0            1.0        1.0
7        1            2.0        NaN
Answered By: mozway
groups = df["feature"].shift().cumsum()
df["previous_feat"] = df["feature"].groupby(groups).cumcount().add(1)
df["next_feat"] = df["feature"].groupby(groups).cumcount(ascending=False).add(1).shift(-1)
   feature  previous_feat  next_feat
0        1            NaN        5.0
1        0            1.0        4.0
2        0            2.0        3.0
3        0            3.0        2.0
4        0            4.0        1.0
5        1            5.0        2.0
6        0            1.0        1.0
7        1            2.0        NaN
Answered By: Chrysophylaxs
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.