Pandas Average If Across Multiple Columns

Question:

In pandas, I’d like to calculate the average age and weight for people playing each sport. I know I can loop, but was wondering what the most efficient way is.

df = pd.DataFrame([
    [0, 1, 0, 30, 150],
    [1, 1, 1, 25, 200],
    [1, 0, 0, 20, 175]
], columns=[
    "Plays Basketball",
    "Plays Soccer",
    "Plays Football",
    "Age",
    "Weight"
])
Plays Basketball Plays Soccer Plays Football Age Weight
0 1 0 30 150
1 1 1 25 200
1 0 0 20 175

I tried groupby but it creates a group for every possible combination of sports played. I just need an average age and weight for each sport.

Result should be:

Age Weight
Plays Basketball 22.5 187.5
Plays Soccer 27.5 175.0
Plays Football 25.0 200.0
Asked By: Scott Guthart

||

Answers:

You could use one groupby for each column you want to summarize:

import pandas as pd

indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
    average = df.groupby(indicator).mean()
    rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)
Answered By: Nick ODell

Use a dot product and normalize by the count to get the mean:

df2 = df.filter(like='Plays')

out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)

Output:

                   Age  Weight
Plays Basketball  22.5   187.5
Plays Soccer      27.5   175.0
Plays Football    25.0   200.0
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.