Ordering multi-indexed pandas dataframe on two levels, with different criteria for each level
Question:
Consider the dataframe df_counts
, constructed as follows:
df2 = pd.DataFrame({
"word" : ["AA", "AC", "AC", "BA", "BB", "BB", "BB"],
"letter1": ["A", "A", "A", "B", "B", "B", "B"],
"letter2": ["A", "C", "C", "A", "B", "B", "B"]
})
df_counts = df2[["word", "letter1", "letter2"]].groupby(["letter1", "letter2"]).count()
Output:
What I would like to do from here, is to order first by letter1
totals, so the rows for letter1 == "B"
appear first (there are four words starting with B
, vs only three with A
), and then ordered within each grouping of letter1
by the values in the word
column.
So the final output should be:
word
letter1 letter2
B B 3
A 1
A C 2
A 1
Is this possible to do?
Answers:
When you have a complex sorting order, it’s always easy to use numpy.lexsort
:
# minor sorting order first, major one last
# - to inverse the order
order = np.lexsort([-df_counts['word'],
-df_counts.groupby('letter1')['word'].transform('sum')])
out = df_counts.iloc[order]
The pandas equivalent would be:
(df_counts
.assign(total=df_counts.groupby('letter1')['word'].transform('sum'))
.sort_values(by=['total', 'word'], ascending=False)
.drop(columns='total')
)
Output:
word
letter1 letter2
B B 3
A 1
A C 2
A 1
use sort_index with index name and ascending pair as True or False
df2 = pd.DataFrame({
"word" : ["AA", "AC", "AC", "BA", "BB", "BB", "BB"],
"letter1": ["A", "A", "A", "B", "B", "B", "B"],
"letter2": ["A", "C", "C", "A", "B", "B", "B"]
})
df_counts = df2[["word", "letter1", "letter2"]].groupby(["letter1", "letter2"]).count()
print(df_counts.index)
print(df_counts.sort_index(level=["letter1","letter2"],ascending=[False,False]))
output:
word
letter1 letter2
B B 3
A 1
A C 2
A 1
Consider the dataframe df_counts
, constructed as follows:
df2 = pd.DataFrame({
"word" : ["AA", "AC", "AC", "BA", "BB", "BB", "BB"],
"letter1": ["A", "A", "A", "B", "B", "B", "B"],
"letter2": ["A", "C", "C", "A", "B", "B", "B"]
})
df_counts = df2[["word", "letter1", "letter2"]].groupby(["letter1", "letter2"]).count()
Output:
What I would like to do from here, is to order first by letter1
totals, so the rows for letter1 == "B"
appear first (there are four words starting with B
, vs only three with A
), and then ordered within each grouping of letter1
by the values in the word
column.
So the final output should be:
word
letter1 letter2
B B 3
A 1
A C 2
A 1
Is this possible to do?
When you have a complex sorting order, it’s always easy to use numpy.lexsort
:
# minor sorting order first, major one last
# - to inverse the order
order = np.lexsort([-df_counts['word'],
-df_counts.groupby('letter1')['word'].transform('sum')])
out = df_counts.iloc[order]
The pandas equivalent would be:
(df_counts
.assign(total=df_counts.groupby('letter1')['word'].transform('sum'))
.sort_values(by=['total', 'word'], ascending=False)
.drop(columns='total')
)
Output:
word
letter1 letter2
B B 3
A 1
A C 2
A 1
use sort_index with index name and ascending pair as True or False
df2 = pd.DataFrame({
"word" : ["AA", "AC", "AC", "BA", "BB", "BB", "BB"],
"letter1": ["A", "A", "A", "B", "B", "B", "B"],
"letter2": ["A", "C", "C", "A", "B", "B", "B"]
})
df_counts = df2[["word", "letter1", "letter2"]].groupby(["letter1", "letter2"]).count()
print(df_counts.index)
print(df_counts.sort_index(level=["letter1","letter2"],ascending=[False,False]))
output:
word
letter1 letter2
B B 3
A 1
A C 2
A 1