Mean, modus and median per group
Question:
I have a dataset df with valuations of several names of hospitals (df[Hospital]) like below (just a short part of it, in total 6500 rows):
In the end I have a list of number of valuations per hospital like below:
What I need to know in the end is the MEAN, MODUS and MEDIAN of the total group of hospitals. In this case the MEAN is 2167, the MODUS is 3000 and the MEDIAN is 2500. But how should the script look like? I know how to calculate this for the total column (df[‘Hospital’].mean()), but how do I do this with calculating a mean per hospital first?
Answers:
It can be done using Group By & Agg as below
df = pd.DataFrame({"Hospital":['A','A','A','B','B','C','C'],
"value":[1,1,2,100,200,20,2]})
df.groupby('Hospital').agg(Mean_value=('value','mean'),
Median_value=('value','median'),
Modus_value=('value',lambda x:x.value_counts().index[0])
).reset_index()
As OP asked for total Mean, Median & Mode in comments, adding code for the same:
You can break below code in two dataframes from where ## Total Calculation start here is mentioned in the code.
df.groupby('Hospital').agg(Mean_value=('value','mean'),
Median_value=('value','median'),
Modus_value=('value',lambda x:x.value_counts().index[0])
).reset_index()[['Mean_value','Median_value','Modus_value']]. ## Total Calculation start here
assign(grp_col = '1').
groupby('grp_col').
agg(Mean_value=('Mean_value','mean'),
Median_value=('Median_value','median'),
Modus_value=('Modus_value',lambda x:x.value_counts().index[0])
).reset_index()
Output:
I have a dataset df with valuations of several names of hospitals (df[Hospital]) like below (just a short part of it, in total 6500 rows):
In the end I have a list of number of valuations per hospital like below:
What I need to know in the end is the MEAN, MODUS and MEDIAN of the total group of hospitals. In this case the MEAN is 2167, the MODUS is 3000 and the MEDIAN is 2500. But how should the script look like? I know how to calculate this for the total column (df[‘Hospital’].mean()), but how do I do this with calculating a mean per hospital first?
It can be done using Group By & Agg as below
df = pd.DataFrame({"Hospital":['A','A','A','B','B','C','C'],
"value":[1,1,2,100,200,20,2]})
df.groupby('Hospital').agg(Mean_value=('value','mean'),
Median_value=('value','median'),
Modus_value=('value',lambda x:x.value_counts().index[0])
).reset_index()
As OP asked for total Mean, Median & Mode in comments, adding code for the same:
You can break below code in two dataframes from where ## Total Calculation start here is mentioned in the code.
df.groupby('Hospital').agg(Mean_value=('value','mean'),
Median_value=('value','median'),
Modus_value=('value',lambda x:x.value_counts().index[0])
).reset_index()[['Mean_value','Median_value','Modus_value']]. ## Total Calculation start here
assign(grp_col = '1').
groupby('grp_col').
agg(Mean_value=('Mean_value','mean'),
Median_value=('Median_value','median'),
Modus_value=('Modus_value',lambda x:x.value_counts().index[0])
).reset_index()
Output: