Python Groupby columns and apply function

Question:

I have a dataframe that looks like this which contains all divisions and both conferences from 2000-2022.

Tm        Conference   Division    W-L%.    Year  
Bills         AFC        East      0.813    2022   
Dolphins      AFC        East      0.529    2022  
Patriots      AFC        East      0.471    2022    
Jets          AFC        East      0.412    2022   
Cowboys       NFC        East      0.706    2022   
Giants        NFC        East      0.559    2022   
Eagles        NFC        East      0.824    2022  
Commanders    NFC        East      0.500    2022    

I want to groupby Team, Conference, and year, and create a new column called ‘Division W-L%’ which finds the average W-L% for every team in a particular division, conference, and year except for the team we are calculating it on. I know the formula to find the Division W-L% which is:
df[‘Division_W-L%’] = (df[‘W-L%’].sum() – df[‘W-L%’]) / (len(df) -1).

This is what I want the dataframe to look like. For example, for the ‘Bills’ we would calculate the Division W-L% by doing (0.529 + 0.471 + 0.412)/3 since those 3 teams are in the same conference, division, and year.

Tm        Conference   Division    W-L%.    Year  Division W-L%
Bills         AFC        East      0.813    2022    0.470667
Dolphins      AFC        East      0.529    2022    0.565333
Patriots      AFC        East      0.471    2022    0.584667
Jets          AFC        East      0.412    2022    0.604333
Cowboys       NFC        East      0.706    2022    0.627667
Giants        NFC        East      0.559    2022    0.676667
Eagles        NFC        East      0.824    2022    0.588333
Commanders    NFC        East      0.500    2022    0.696333

I tried doing what I described above, which is grouping by those three categories, and then applying that formula to the W-L% column, however I continue to receive errors. All help is appreciated!

Asked By: bobfreedy00

||

Answers:

Try:

df['Division W-L% NEW'] = df.groupby(['Conference', 'Division', 'Year'])['W-L%.'].transform('sum') - df['W-L%.']
df['Division W-L% NEW'] = df.groupby(['Conference', 'Division', 'Year'])['Division W-L% NEW'].transform(lambda x: x / (len(x)-1))
print(df)

Prints:

           Tm Conference Division  W-L%.  Year  Division W-L%  Division W-L% NEW
0       Bills        AFC     East  0.813  2022       0.470667           0.470667
1    Dolphins        AFC     East  0.529  2022       0.565333           0.565333
2    Patriots        AFC     East  0.471  2022       0.584667           0.584667
3        Jets        AFC     East  0.412  2022       0.604333           0.604333
4     Cowboys        NFC     East  0.706  2022       0.627667           0.627667
5      Giants        NFC     East  0.559  2022       0.676667           0.676667
6      Eagles        NFC     East  0.824  2022       0.588333           0.588333
7  Commanders        NFC     East  0.500  2022       0.696333           0.696333
Answered By: Andrej Kesely

You can use transform instead of apply. Compute the sum for the group and subtract the W-L%. of the current row then divide by the size of the group minus 1 (because you want to exclude the row itself):

df['Division W-L%'] = (df.groupby(['Conference', 'Division', 'Year'])['W-L%.']
                         .transform(lambda x: (x.sum() - x) / (len(x) - 1)))

Output:

>>> df
           Tm Conference Division  W-L%.  Year  Division W-L%
0       Bills        AFC     East  0.813  2022       0.470667
1    Dolphins        AFC     East  0.529  2022       0.565333
2    Patriots        AFC     East  0.471  2022       0.584667
3        Jets        AFC     East  0.412  2022       0.604333
4     Cowboys        NFC     East  0.706  2022       0.627667
5      Giants        NFC     East  0.559  2022       0.676667
6      Eagles        NFC     East  0.824  2022       0.588333
7  Commanders        NFC     East  0.500  2022       0.696333
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.