Multi Index Sorting in Pandas

Question:

I have a dataset with multi-index columns in a pandas df that I would like to sort by values in a specific column. My dataset looks like:

    Group1    Group2
    A B C     A B C
1   1 0 3     2 5 7
2   5 6 9     1 0 0
3   7 0 2     0 3 5 

I want to sort all data and the index by column C in Group 1 in descending order so my results look like:

   Group1    Group2
   A B C     A B C
2  5 6 9     1 0 0
1  1 0 3     2 5 7
3  7 0 2     0 3 5 

Is it possible to do this sort with the structure that my data is in, or should I be swapping Group1 to the index side?

Asked By: MattB

||

Answers:

When sorting by a MultiIndex you need to contain the tuple describing the column inside a list*:

In [11]: df.sort_values([('Group1', 'C')], ascending=False)
Out[11]: 
  Group1       Group2      
       A  B  C      A  B  C
2      5  6  9      1  0  0
1      1  0  3      2  5  7
3      7  0  2      0  3  5

* so as not to confuse pandas into thinking you want to sort first by Group1 then by C.


Note: Originally used .sort since deprecated then removed in 0.20, in favor of .sort_values.

Answered By: Andy Hayden
  1. You can sort by indexing the columns (e.g. by the third column etc.). Also, you don’t need the square brackets, so a tuple to index the column works.

    # sort in descending order by the third column
    df.sort_values(('Group1', 'C'), ascending=False)
    
    df.sort_values(df.columns[2], ascending=False)   # same as above
    

    res1

  2. If you want to sort by multiple columns, then use a list of tuples (or simply index the columns). Also may pass a list to ascending to choose whether to make the sort ascending or not on that column.

    # sort by (Group1, B) in descending order and (Group1, A) in ascending order
    df.sort_values(by=[('Group1', 'B'), ('Group1', 'A')], ascending=[False, True])
    
    df.sort_values(df.columns[[1, 0]].tolist(), ascending=[False, True])
    

    res2

  3. If you’re here to find code to sort a multi-indexed dataframe, then you can use sort_index. For example, if you want to sort the second level in descending order and the first level in ascending order:

    # select levels by name
    df.sort_index(level=['Name', 'Groups'], ascending=[True, False])
    
    # select levels by index (this works even if indices are unnamed)
    df.sort_index(level=[1, 0], ascending=[True, False])
    

    res3

Answered By: cottontail
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.