Pandas Average If Across Multiple Columns

Question

In pandas, I’d like to calculate the average age and weight for people playing each sport. I know I can loop, but was wondering what the most efficient way is.

df = pd.DataFrame([
    [0, 1, 0, 30, 150],
    [1, 1, 1, 25, 200],
    [1, 0, 0, 20, 175]
], columns=[
    "Plays Basketball",
    "Plays Soccer",
    "Plays Football",
    "Age",
    "Weight"
])

Plays Basketball	Plays Soccer	Plays Football	Age	Weight
0	1	0	30	150
1	1	1	25	200
1	0	0	20	175

I tried groupby but it creates a group for every possible combination of sports played. I just need an average age and weight for each sport.

Result should be:

	Age	Weight
Plays Basketball	22.5	187.5
Plays Soccer	27.5	175.0
Plays Football	25.0	200.0

Asked By: Scott Guthart

||

Source

Answer 1

You could use one groupby for each column you want to summarize:

import pandas as pd

indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
    average = df.groupby(indicator).mean()
    rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)

Answered By: Nick ODell

Answer 2

Use a dot product and normalize by the count to get the mean:

df2 = df.filter(like='Plays')

out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)

Output:

                   Age  Weight
Plays Basketball  22.5   187.5
Plays Soccer      27.5   175.0
Plays Football    25.0   200.0

Answered By: mozway

Pandas Average If Across Multiple Columns

Question:

Answers: