How to count cumulatively with conditions on a groupby?

Question:

Say I have a data-frame, filled as below, with the column 'Key' having one of five possible values A, B, C, D, X. I would like to add a new column 'Res' that counts the number of these letters cumulatively and resets each time it hits and X.

For example:

   Key    Res
0  D      1
1  X      0
2  B      1
3  C      2
4  D      3
5  X      0
6  A      1
7  C      2
8  X      0
9  X      0

May anyone assist in how I can achieve this?

Asked By: vicefreak04

||

Answers:

A possible solution:

a = df.Key.ne('X')
df['new'] = ((a.cumsum()-a.cumsum().where(~a).ffill().fillna(0)).astype(int))

Another possible solution, which is more basic than the previous one, but much faster (several orders of magnitude):

s = np.zeros(len(df), dtype=int)

for i in range(len(df)):
    if df.Key[i] != 'X':
        s[i] = s[i-1] + 1
        
df['new'] = s

Output:

  Key  Res  new
0   D    1    1
1   X    0    0
2   B    1    1
3   C    2    2
4   D    3    3
5   X    0    0
6   A    1    1
7   C    2    2
8   X    0    0
9   X    0    0
    
Answered By: PaulS

Example

df = pd.DataFrame(list('DXBCDXACXX'), columns=['Key'])

df

    Key
0   D
1   X
2   B
3   C
4   D
5   X
6   A
7   C
8   X
9   X

Code

df1 = pd.concat([df.iloc[[0]], df])
grouper = df1['Key'].eq('X').cumsum()
df1.assign(Res=df1.groupby(grouper).cumcount()).iloc[1:]

result:

    Key Res
0   D   1
1   X   0
2   B   1
3   C   2
4   D   3
5   X   0
6   A   1
7   C   2
8   X   0
9   X   0
Answered By: Panda Kim
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.