Pandas How to flag consecutive values ignoring the first occurrence

Question

I have the following code:

data={'id':[1,2,3,4,5,6,7,8,9,10,11],
  'value':[1,0,1,0,1,1,1,0,0,1,0]}
  df=pd.DataFrame.from_dict(data)
  df
  Out[8]: 
      id  value
 0    1      1
 1    2      0
 2    3      1
 3    4      0
 4    5      1
 5    6      1
 6    7      1
 7    8      0
 8    9      0
 9    10     1
10    11     0

I want to create a flag column that indicate with 1 consecutive values starting from the second occurrence and ignoring the first.

With the actual solution:

df['flag'] = 
df.value.groupby([df.value,df.flag.diff().ne(0).cumsum()]).transform('size').ge(3).astype(int)


Out[8]: 
        id  value  flag
    0    1      1     0
    1    2      0     0
    2    3      1     0
    3    4      0     0
    4    5      1     1
    5    6      1     1
    6    7      1     1
    7    8      0     1
    8    9      0     1
    9    10     1     0
   10    11     0     0

While I need a solution like this, where the first occurence is flagged as 0 and 1 starting from the second:

Out[8]: 
        id  value  flag
    0    1      1    0
    1    2      0    0
    2    3      1    0
    3    4      0    0
    4    5      1    0
    5    6      1    1
    6    7      1    1
    7    8      0    0
    8    9      0    1
    9    10     1    0
   10    11     0    0

Asked By: pasq

||

Source

Answer 1

Create consecutive groups by compared Series.shifted values by not equal and Series.cumsum, create counter by GroupBy.cumcount and compare if greater values like 0 by Series.gt, last map True, False to 1, 0 by casting to integers by Series.astype:

df['flag'] = (df.groupby(df['value'].ne(df['value'].shift()).cumsum())
                .cumcount()
                .gt(0)
                .astype(int))
print (df)
    id  value  flag
0    1      1     0
1    2      0     0
2    3      1     0
3    4      0     0
4    5      1     0
5    6      1     1
6    7      1     1
7    8      0     0
8    9      0     1
9   10      1     0
10  11      0     0

How it working:

print (df.assign(g = df['value'].ne(df['value'].shift()).cumsum(),
                 counter = df.groupby(df['value'].ne(df['value'].shift()).cumsum()).cumcount(),
                 mask = df.groupby(df['value'].ne(df['value'].shift()).cumsum()).cumcount().gt(0)))

    id  value  g  counter   mask
0    1      1  1        0  False
1    2      0  2        0  False
2    3      1  3        0  False
3    4      0  4        0  False
4    5      1  5        0  False
5    6      1  5        1   True
6    7      1  5        2   True
7    8      0  6        0  False
8    9      0  6        1   True
9   10      1  7        0  False
10  11      0  8        0  False

Answered By: jezrael

Answer 2

Use groupby.cumcount and a custom grouper:

# group by identical successive values
grp = df['value'].ne(df['value'].shift()).cumsum()

# flag all but the first one (>0)
# convert the booleans True/False to integers 1/0
df['flag'] = df.groupby(grp).cumcount().gt(0).astype(int)

Generic code to skip first N:

N = 1

grp = df['value'].ne(df['value'].shift()).cumsum()
df['flag'] = df.groupby(grp).cumcount().ge(N).astype(int)

Output:

    id  value  flag
0    1      1     0
1    2      0     0
2    3      1     0
3    4      0     0
4    5      1     0
5    6      1     1
6    7      1     1
7    8      0     0
8    9      0     1
9   10      1     0
10  11      0     0

Answered By: mozway

Answer 3

df.assign(flag=df.value.diff().eq(0).astype(int))

out：

    id  value  flag
0    1      1     0
1    2      0     0
2    3      1     0
3    4      0     0
4    5      1     0
5    6      1     1
6    7      1     1
7    8      0     0
8    9      0     1
9   10      1     0
10  11      0     0

Answered By: G.G

Pandas How to flag consecutive values ignoring the first occurrence

Question:

Answers: