Python pandas sorting based on aggregate statistic after group by

Question:

I have a pandas data-frame (there are other columns that I’m not including for sake of simplicity) that looks like this

Country    City       Rank
  USA     New York    7
  UK      London      6
  UK      London      6
  USA     New York    4
  Spain   Madrid      9
  USA     New York    8
  USA     Boston      5
 ...       ...       ...       

I’m trying to group by country and then by city. Moreover, I want to compute count, mean and median and then sort first by mean and then by count.

I have been able to get the data in the following format using this bit of code:

df_3 = df_2[["Country", "City", "Rank"]].groupby(
    ["Country", "City"]).agg(['count', 'mean', 'median'])

                        Rank                  
                       count       mean median
Country City                                 
Spain   Madrid             1        --     --
USA     Boston             1        --     --
        New York           3        --     --
UK      London             2        --     --

Now, for each country, I want to be able to sort cities first by mean and then by count, both in descending order. I don’t know how to do it. Any help highly appreciated.

Asked By: Rhombus

||

Answers:

I think you would want to maintain the grouping in level=Country of multiindex in such a case you first have to reset the index then sort the values on Country, mean and count then set the index again

cols = ['Country', 'City']
(
    df
    .groupby(cols)['Rank']
    .agg(['count', 'mean', 'median'])
    .reset_index()
    .sort_values(['Country', 'mean', 'count'], ascending=[1, 0, 0])
    .set_index(cols)
)

Result

                  count      mean  median
Country City                             
Spain   Madrid        1  9.000000     9.0
UK      London        2  6.000000     6.0
USA     New York      3  6.333333     7.0
        Boston        1  5.000000     5.0
Answered By: Shubham Sharma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.