How can I count the number of times an entry is a repeat of the previous entry within a column while grouping by another column in Python?

Question:

Consider the following table for example:

import pandas as pd

data = {'Group':["AGroup", "AGroup", "AGroup", "AGroup", "BGroup", "BGroup", "BGroup", "BGroup", "CGroup", "CGroup", "CGroup", "CGroup"],
        'Status':["Low", "Low", "High", "High", "High", "Low", "High", "Low", "Low", "Low", "High", "High"],
        'CountByGroup':[1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2]}

pd.DataFrame(data)

This creates the following table:

Group   Status  CountByGroup
AGroup  Low     1
AGroup  Low     2
AGroup  High    1
AGroup  High    2
BGroup  High    1
BGroup  Low     1
BGroup  High    1
BGroup  Low     1
CGroup  Low     1
CGroup  Low     2
CGroup  High    1
CGroup  High    2

The CountByGroup column is what I am trying to create. Here you can see that "Low" appeared once so far for the "AGroup" in the first row, so it has an entry of 1. "Low" directly follows the same entry "Low" in the second row, so it has an entry of 2. If it were to appear a third time in a row in the third row, CountByGroup would display an entry of 3.

We’re also grouping these "Group", so the first entry for a new group is always 1 since it is the first time any entry has appeared for the group.

This was solved in a previous question I had using R which is available here, but I’m not sure how to solve this using Python.

Asked By: GM01

||

Answers:

Assuming your dataframe is correctly sorted or use df.sort_values('Group', kind='stable') before:

cols = ['Group', 'Status']
df['CountByGroup2'] = df[cols].eq(df[cols].shift()).all(axis=1).astype(int).add(1)
print(df)

# Output
     Group Status  CountByGroup  CountByGroup2
0   AGroup    Low             1              1
1   AGroup    Low             2              2
2   AGroup   High             1              1
3   AGroup   High             2              2
4   BGroup   High             1              1
5   BGroup    Low             1              1
6   BGroup   High             1              1
7   BGroup    Low             1              1
8   CGroup    Low             1              1
9   CGroup    Low             2              2
10  CGroup   High             1              1
11  CGroup   High             2              2
Answered By: Corralien

Usually, you would use cumsum on the row differences to identify continuous blocks. Then you can group by the Group and the Block then cumcount:

# assuming data is sorted by Group as in the example
blocks = df['Status'].ne(df['Status'].shift()).cumsum()
df['CountByGroup'] = df.groupby(['Group', blocks]).cumcount() + 1

Note if the data is not sorted by Group, you would need to sort before creating the blocks:

blocks = df['Status'].ne(df.sort_values('Group', kind='stable')['Status'].shift()).cumsum()

or a groupby:

blocks = df['Status'].ne(df.groupby('Group')['Status'].shift()).cumsum()

Output:

     Group Status  CountByGroup
0   AGroup    Low             1
1   AGroup    Low             2
2   AGroup   High             1
3   AGroup   High             2
4   BGroup   High             1
5   BGroup    Low             1
6   BGroup   High             1
7   BGroup    Low             1
8   CGroup    Low             1
9   CGroup    Low             2
10  CGroup   High             1
11  CGroup   High             2
Answered By: Quang Hoang
import pandas as pd

data = {'Group':["AGroup", "AGroup", "AGroup", "AGroup", "BGroup", "BGroup", "BGroup", "BGroup", "CGroup", "CGroup", "CGroup", "CGroup"],
        'Status':["Low", "Low", "High", "High", "High", "Low", "High", "Low", "Low", "Low", "High", "High"],
        'CountByGroup':[1, 2, 1, 2, 1, 1, 1, 1, 1, 2, 1, 2]}

df = pd.DataFrame(data)

s = df[['Group', 'Status']].groupby(['Group', 'Status']).ngroup()
s = s.groupby(lambda x: s[x]).cumcount()+1

print(pd.concat([df[['Group', 'Status']], s.rename("CountByGroup")], axis=1))
      Group Status  CountByGroup
0   AGroup    Low             1
1   AGroup    Low             2
2   AGroup   High             1
3   AGroup   High             2
4   BGroup   High             1
5   BGroup    Low             1
6   BGroup   High             2
7   BGroup    Low             2
8   CGroup    Low             1
9   CGroup    Low             2
10  CGroup   High             1
11  CGroup   High             2
Answered By: Laurent B.
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.