How to create incrementing group column counter

Question:

Consider the following data set:

How can I generate the expected value, ExpectedGroup such that the same value exists when True, but changes and increments by 1, when we run into a False statement in case_id.

df = pd.DataFrame([
            ['A', 'P', 'O', 2, np.nan],
            ['A', 'O', 'O', 5, 1],
            ['A', 'O', 'O', 10, 1],
            ['A', 'O', 'P', 4, np.nan],
            ['A', 'P', 'P', 300, np.nan],
            ['A', 'P', 'O', 2, np.nan],
            ['A', 'O', 'O', 5, 2],
            ['A', 'O', 'O', 10, 2],
            ['A', 'O', 'P', 4, np.nan],
            ['A', 'P', 'P', 300, np.nan],
            ['B', 'P', 'O', 2, np.nan],
            ['B', 'O', 'O', 5, 3],
            ['B', 'O', 'O', 10, 3],
            ['B', 'O', 'P', 4, np.nan],
            ['B', 'P', 'P', 300, np.nan],
            ],
        columns = ['ID', 'FromState', 'ToState', 'Hours', 'ExpectedGroup'])
# create boolean mask
df['case_id'] = ( (df.FromState == 'O') & (df.ToState == 'O')  )

0     False
1      True
2      True
3     False
4     False
5     False
6      True
7      True
8     False
9     False
10    False
11     True
12     True
13    False
14    False
Name: case_id, dtype: boo
# but how to get incrementing groups?
np.where(df['case_id'] != False, df['case_id'].cumsum(), np.nan)
Asked By: John Stud

||

Answers:

You can use diff to select only the first item of each stretch of True:

df['ExpectedGroup'] = (df['case_id'].diff()
                      &df['case_id']
                      ).cumsum().where(df['case_id'])

If you don’t want the intermediate column:

s = (df.FromState == 'O') & (df.ToState == 'O')
# or
# s = df[['FromState', 'ToState']].eq('O').all(axis=1)

df['ExpectedGroup'] = (s.diff()&s).cumsum().where(s)
# or
# df.loc[s, 'ExpectedGroup'] = (s.diff()&s).cumsum()

Output:

  ID FromState ToState  Hours  ExpectedGroup  case_id
0  A         P       O      2            NaN    False
1  A         O       O      5            1.0     True
2  A         O       O     10            1.0     True
3  A         O       P      4            NaN    False
4  A         P       P    300            NaN    False
5  B         P       O      2            NaN    False
6  B         O       O      5            2.0     True
7  B         O       O     10            2.0     True
8  B         O       P      4            NaN    False
9  B         P       P    300            NaN    False
Answered By: mozway

Let’s use cumsum to create counter then reencode the counter using factorize

m = df['case_id']
df.loc[m, 'ExpectedGroup'] = (~m).cumsum()[m].factorize()[0] + 1

   ID FromState ToState  Hours  ExpectedGroup  case_id
0   A         P       O      2            NaN    False
1   A         O       O      5            1.0     True
2   A         O       O     10            1.0     True
3   A         O       P      4            NaN    False
4   A         P       P    300            NaN    False
5   A         P       O      2            NaN    False
6   A         O       O      5            2.0     True
7   A         O       O     10            2.0     True
8   A         O       P      4            NaN    False
9   A         P       P    300            NaN    False
10  B         P       O      2            NaN    False
11  B         O       O      5            3.0     True
12  B         O       O     10            3.0     True
13  B         O       P      4            NaN    False
14  B         P       P    300            NaN    False
Answered By: Shubham Sharma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.