How can I calculate number of consecutive values in a column within a group in a pandas dataframe?

Question:

I have a dataframe with all of a fighter’s fights, the fight number (i.e. if it is their first, second, etc.), and whether or not they won the fight. I would like to calculate the number of consecutive wins a fighter had gotten before their current fight (i.e. not including if they won the current fight). I am currently working with Python 3.7 in Spyder.

Suppose we have the following dataframe, where win = 1 if the fighter won the fight:

df = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 
                  'fight_number' :  ['1', '2', '3', '4', '1', '2', '3', '1', '2'],
                  'win' : [0, 0, 1, 1, 1, 1, 0, 1, 1]})
  fighter  fight_number  win
0       A             1     0
1       A             2     0
2       A             3     1
3       A             4     1
4       B             1     1
5       B             2     1
6       B             3     0
7       C             1     1
8       C             2     1

I know that to calculate win streaks across all rows, I can implement the solution proposed here with:

grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(grouper).cumsum()

which produces:

  fighter fight_number  win  streak
0       A            1    0       0
1       A            2    0       0
2       A            3    1       1
3       A            4    1       2
4       B            1    1       3
5       B            2    1       4
6       B            3    0       0
7       C            1    1       1
8       C            2    1       2

But what I need is to apply this approach to subgroups of the dataframe (i.e. to each fighter) and to not include the outcome of the current fight in the count of the streak. So, I am basically trying to have the current win streak of the fighter when they enter the fight.

The target output in this example would therefore be:

  fighter fight_number  win  streak
0       A            1    0       0
1       A            2    0       0
2       A            3    1       0
3       A            4    1       1
4       B            1    1       0
5       B            2    1       1
6       B            3    0       2
7       C            1    1       0
8       C            2    1       1

I appreciate any advice I can get on this, as I am pretty new to Python.

Asked By: Joe

||

Answers:

One solution I came up with was inspired by an earlier answer posted (but deleted) by jezrael:

grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(['fighter', grouper]).cumsum()
df['streak'] = df.groupby('fighter')['streak'].shift(1).fillna(0)

which produces the target output:

  fighter fight_number  win  streak
0       A            1    0     0.0
1       A            2    0     0.0
2       A            3    1     0.0
3       A            4    1     1.0
4       B            1    1     0.0
5       B            2    1     1.0
6       B            3    0     2.0
7       C            1    1     0.0
8       C            2    1     1.0

and it also seems to work on other test examples:

df2 = pd.DataFrame({'fighter' : ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'], 
                  'fight number' :  ["1", "2", "3", "4", "5", "6", "1", "2", "3", "1", "2"],
                  'win' : [1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1]}) 

grouper = (df2.win != df2.win.shift()).cumsum()
df2['streak'] = df2.groupby(['fighter', grouper]).cumsum()
df2['streak'] = df2.groupby('fighter')['streak'].shift(1).fillna(0)

   fighter fight number  win  streak
0        A            1    1     0.0
1        A            2    1     1.0
2        A            3    0     2.0
3        A            4    1     0.0
4        A            5    0     1.0
5        A            6    1     0.0
6        B            1    1     0.0
7        B            2    1     1.0
8        B            3    0     2.0
9        C            1    1     0.0
10       C            2    1     1.0

Answered By: Joe

For some reason joe’s answer didn’t quite work, but this did:

df = df.sort_values(['fighter', 'date'])
grouper = (df.win != df.win.shift()).cumsum()
df['streak'] = df.groupby(['fighter', grouper])['win'].cumsum()
df.sort_index(inplace=True)
Answered By: flyingtriangle
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.