Ordering multi-indexed pandas dataframe on two levels, with different criteria for each level

Question

Consider the dataframe df_counts, constructed as follows:

df2 = pd.DataFrame({
    "word" : ["AA", "AC", "AC", "BA", "BB", "BB", "BB"],
    "letter1": ["A", "A", "A", "B", "B", "B", "B"],
    "letter2": ["A", "C", "C", "A", "B", "B", "B"]
})
df_counts = df2[["word", "letter1", "letter2"]].groupby(["letter1", "letter2"]).count()

Output:

What I would like to do from here, is to order first by letter1 totals, so the rows for letter1 == "B" appear first (there are four words starting with B, vs only three with A), and then ordered within each grouping of letter1 by the values in the word column.

So the final output should be:

                 word
letter1 letter2 
      B       B     3
              A     1
      A       C     2
              A     1

Is this possible to do?

Asked By: butterflyknife

||

Source

Answer 1

When you have a complex sorting order, it’s always easy to use numpy.lexsort:

# minor sorting order first, major one last
# - to inverse the order
order = np.lexsort([-df_counts['word'],
                    -df_counts.groupby('letter1')['word'].transform('sum')])

out = df_counts.iloc[order]

The pandas equivalent would be:

(df_counts
 .assign(total=df_counts.groupby('letter1')['word'].transform('sum'))
 .sort_values(by=['total', 'word'], ascending=False)
 .drop(columns='total')
)

Output:

                 word
letter1 letter2      
B       B           3
        A           1
A       C           2
        A           1

Answered By: mozway

Answer 2

use sort_index with index name and ascending pair as True or False

df2 = pd.DataFrame({
    "word" : ["AA", "AC", "AC", "BA", "BB", "BB", "BB"],
    "letter1": ["A", "A", "A", "B", "B", "B", "B"],
    "letter2": ["A", "C", "C", "A", "B", "B", "B"]
})

df_counts = df2[["word", "letter1", "letter2"]].groupby(["letter1", "letter2"]).count()

print(df_counts.index)
print(df_counts.sort_index(level=["letter1","letter2"],ascending=[False,False]))

output:

                    word
letter1 letter2      
B       B           3
        A           1
A       C           2
        A           1

Answered By: Golden Lion

Ordering multi-indexed pandas dataframe on two levels, with different criteria for each level

Question:

Answers: