Python pandas: mean and sum groupby on different columns at the same time
Question:
I have a pandas dataframe which looks like the following:
Name Missed Credit Grade
A 1 3 10
A 1 1 12
B 2 3 10
B 1 2 20
And my desired output is:
Name Sum1 Sum2 Average
A 2 4 11
B 3 5 15
Basically to get the sum of column Credit
and Missed
and to do average on Grade
. What I am doing right now is two groupby on Name
and then get sum and average and finally merge the two output dataframes which does not seem to be the best way of doing this. I have also found this on SO which makes sense if I want to work only on one column:
df.groupby('Name')['Credit'].agg(['sum','average'])
But not sure how to do a one-liner for both columns?
Answers:
You need agg
by dictionary
and then rename
columns names:
d = {'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'}
df=df.groupby('Name').agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'}).rename(columns=d)
print (df)
Sum1 Sum2 Average
Name
A 2 4 11
B 3 5 15
If want also create column from Name
:
df = (df.groupby('Name', as_index=False)
.agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'})
.rename(columns={'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'}))
print (df)
Name Sum1 Sum2 Average
0 A 2 4 11
1 B 3 5 15
Solution with named aggregations:
df = df.groupby('Name', as_index=False).agg(Sum1=('Missed','sum'),
Sum2= ('Credit','sum'),
Average=('Grade','mean'))
print (df)
Name Sum1 Sum2 Average
0 A 2 4 11
1 B 3 5 15
A = pd.DataFrame.from_dict({'Name':['A','A','B','B'],'Missed':[1,1,2,1],'Credit':[3,1,3,2],'Grades':[10,12,10,20]})
A.groupby('Name').agg({'Missed':'sum','Credit':'sum','Grades':'mean'})
I have a pandas dataframe which looks like the following:
Name Missed Credit Grade
A 1 3 10
A 1 1 12
B 2 3 10
B 1 2 20
And my desired output is:
Name Sum1 Sum2 Average
A 2 4 11
B 3 5 15
Basically to get the sum of column Credit
and Missed
and to do average on Grade
. What I am doing right now is two groupby on Name
and then get sum and average and finally merge the two output dataframes which does not seem to be the best way of doing this. I have also found this on SO which makes sense if I want to work only on one column:
df.groupby('Name')['Credit'].agg(['sum','average'])
But not sure how to do a one-liner for both columns?
You need agg
by dictionary
and then rename
columns names:
d = {'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'}
df=df.groupby('Name').agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'}).rename(columns=d)
print (df)
Sum1 Sum2 Average
Name
A 2 4 11
B 3 5 15
If want also create column from Name
:
df = (df.groupby('Name', as_index=False)
.agg({'Missed':'sum', 'Credit':'sum','Grade':'mean'})
.rename(columns={'Missed':'Sum1', 'Credit':'Sum2','Grade':'Average'}))
print (df)
Name Sum1 Sum2 Average
0 A 2 4 11
1 B 3 5 15
Solution with named aggregations:
df = df.groupby('Name', as_index=False).agg(Sum1=('Missed','sum'),
Sum2= ('Credit','sum'),
Average=('Grade','mean'))
print (df)
Name Sum1 Sum2 Average
0 A 2 4 11
1 B 3 5 15
A = pd.DataFrame.from_dict({'Name':['A','A','B','B'],'Missed':[1,1,2,1],'Credit':[3,1,3,2],'Grades':[10,12,10,20]})
A.groupby('Name').agg({'Missed':'sum','Credit':'sum','Grades':'mean'})