Pandas Average If Across Multiple Columns
Question:
In pandas, I’d like to calculate the average age and weight for people playing each sport. I know I can loop, but was wondering what the most efficient way is.
df = pd.DataFrame([
[0, 1, 0, 30, 150],
[1, 1, 1, 25, 200],
[1, 0, 0, 20, 175]
], columns=[
"Plays Basketball",
"Plays Soccer",
"Plays Football",
"Age",
"Weight"
])
Plays Basketball
Plays Soccer
Plays Football
Age
Weight
0
1
0
30
150
1
1
1
25
200
1
0
0
20
175
I tried groupby
but it creates a group for every possible combination of sports played. I just need an average age and weight for each sport.
Result should be:
Age
Weight
Plays Basketball
22.5
187.5
Plays Soccer
27.5
175.0
Plays Football
25.0
200.0
Answers:
You could use one groupby for each column you want to summarize:
import pandas as pd
indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
average = df.groupby(indicator).mean()
rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)
Use a dot
product and normalize by the count to get the mean:
df2 = df.filter(like='Plays')
out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)
Output:
Age Weight
Plays Basketball 22.5 187.5
Plays Soccer 27.5 175.0
Plays Football 25.0 200.0
In pandas, I’d like to calculate the average age and weight for people playing each sport. I know I can loop, but was wondering what the most efficient way is.
df = pd.DataFrame([
[0, 1, 0, 30, 150],
[1, 1, 1, 25, 200],
[1, 0, 0, 20, 175]
], columns=[
"Plays Basketball",
"Plays Soccer",
"Plays Football",
"Age",
"Weight"
])
Plays Basketball | Plays Soccer | Plays Football | Age | Weight |
---|---|---|---|---|
0 | 1 | 0 | 30 | 150 |
1 | 1 | 1 | 25 | 200 |
1 | 0 | 0 | 20 | 175 |
I tried groupby
but it creates a group for every possible combination of sports played. I just need an average age and weight for each sport.
Result should be:
Age | Weight | |
---|---|---|
Plays Basketball | 22.5 | 187.5 |
Plays Soccer | 27.5 | 175.0 |
Plays Football | 25.0 | 200.0 |
You could use one groupby for each column you want to summarize:
import pandas as pd
indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
average = df.groupby(indicator).mean()
rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)
Use a dot
product and normalize by the count to get the mean:
df2 = df.filter(like='Plays')
out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)
Output:
Age Weight
Plays Basketball 22.5 187.5
Plays Soccer 27.5 175.0
Plays Football 25.0 200.0