Multi Index Sorting in Pandas
Question:
I have a dataset with multi-index columns in a pandas df that I would like to sort by values in a specific column. My dataset looks like:
Group1 Group2
A B C A B C
1 1 0 3 2 5 7
2 5 6 9 1 0 0
3 7 0 2 0 3 5
I want to sort all data and the index by column C
in Group 1
in descending order so my results look like:
Group1 Group2
A B C A B C
2 5 6 9 1 0 0
1 1 0 3 2 5 7
3 7 0 2 0 3 5
Is it possible to do this sort with the structure that my data is in, or should I be swapping Group1
to the index side?
Answers:
When sorting by a MultiIndex you need to contain the tuple describing the column inside a list*:
In [11]: df.sort_values([('Group1', 'C')], ascending=False)
Out[11]:
Group1 Group2
A B C A B C
2 5 6 9 1 0 0
1 1 0 3 2 5 7
3 7 0 2 0 3 5
* so as not to confuse pandas into thinking you want to sort first by Group1 then by C.
Note: Originally used .sort
since deprecated then removed in 0.20, in favor of .sort_values
.
-
You can sort by indexing the columns (e.g. by the third column etc.). Also, you don’t need the square brackets, so a tuple to index the column works.
# sort in descending order by the third column
df.sort_values(('Group1', 'C'), ascending=False)
df.sort_values(df.columns[2], ascending=False) # same as above
-
If you want to sort by multiple columns, then use a list of tuples (or simply index the columns). Also may pass a list to ascending
to choose whether to make the sort ascending or not on that column.
# sort by (Group1, B) in descending order and (Group1, A) in ascending order
df.sort_values(by=[('Group1', 'B'), ('Group1', 'A')], ascending=[False, True])
df.sort_values(df.columns[[1, 0]].tolist(), ascending=[False, True])
-
If you’re here to find code to sort a multi-indexed dataframe, then you can use sort_index
. For example, if you want to sort the second level in descending order and the first level in ascending order:
# select levels by name
df.sort_index(level=['Name', 'Groups'], ascending=[True, False])
# select levels by index (this works even if indices are unnamed)
df.sort_index(level=[1, 0], ascending=[True, False])
I have a dataset with multi-index columns in a pandas df that I would like to sort by values in a specific column. My dataset looks like:
Group1 Group2
A B C A B C
1 1 0 3 2 5 7
2 5 6 9 1 0 0
3 7 0 2 0 3 5
I want to sort all data and the index by column C
in Group 1
in descending order so my results look like:
Group1 Group2
A B C A B C
2 5 6 9 1 0 0
1 1 0 3 2 5 7
3 7 0 2 0 3 5
Is it possible to do this sort with the structure that my data is in, or should I be swapping Group1
to the index side?
When sorting by a MultiIndex you need to contain the tuple describing the column inside a list*:
In [11]: df.sort_values([('Group1', 'C')], ascending=False)
Out[11]:
Group1 Group2
A B C A B C
2 5 6 9 1 0 0
1 1 0 3 2 5 7
3 7 0 2 0 3 5
* so as not to confuse pandas into thinking you want to sort first by Group1 then by C.
Note: Originally used .sort
since deprecated then removed in 0.20, in favor of .sort_values
.
-
You can sort by indexing the columns (e.g. by the third column etc.). Also, you don’t need the square brackets, so a tuple to index the column works.
# sort in descending order by the third column df.sort_values(('Group1', 'C'), ascending=False) df.sort_values(df.columns[2], ascending=False) # same as above
-
If you want to sort by multiple columns, then use a list of tuples (or simply index the columns). Also may pass a list to
ascending
to choose whether to make the sort ascending or not on that column.# sort by (Group1, B) in descending order and (Group1, A) in ascending order df.sort_values(by=[('Group1', 'B'), ('Group1', 'A')], ascending=[False, True]) df.sort_values(df.columns[[1, 0]].tolist(), ascending=[False, True])
-
If you’re here to find code to sort a multi-indexed dataframe, then you can use
sort_index
. For example, if you want to sort the second level in descending order and the first level in ascending order:# select levels by name df.sort_index(level=['Name', 'Groups'], ascending=[True, False]) # select levels by index (this works even if indices are unnamed) df.sort_index(level=[1, 0], ascending=[True, False])