Sorting the order of bars in pandas/matplotlib bar plots
Question:
What is the Pythonic/pandas way of sorting ‘levels’ within a column in pandas to give a specific ordering of bars in bar plot.
For example, given:
import pandas as pd
df = pd.DataFrame({
'group': ['a', 'a', 'a', 'a', 'a', 'a', 'a',
'b', 'b', 'b', 'b', 'b', 'b', 'b'],
'day': ['Mon', 'Tues', 'Fri', 'Thurs', 'Sat', 'Sun', 'Weds',
'Fri', 'Sun', 'Thurs', 'Sat', 'Weds', 'Mon', 'Tues'],
'amount': [1, 2, 4, 2, 1, 1, 2, 4, 5, 3, 4, 2, 1, 3]})
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')
I can generate the following pair of plots:
The order of the bars follows the row order.
What’s the best way of reordering the data so that the bar charts have bars ordered Mon-Sun?
UPDATE: this rubbish solution works – but it’s far from elegant in the way it uses an extra sorting column:
df2 = pd.DataFrame({
'day': ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun'],
'num': [0, 1, 2, 3, 4, 5, 6]})
df = pd.merge(df, df2, on='day')
df = df.sort_values('num')
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')
FURTHER GENERALISATION:
Is there a solution that also fixes the order of bars in a ‘dodged’ bar plot:
df.pivot('day', 'group', 'amount').plot(kind='bar')
Answers:
You’ll have to provide a mapping to specify how to order the day names. (If they were stored as proper dates, there would be other ways to do this.)
Updated:
Build the key. You could write out a dictionary explicitly or use something clever like this dict comprehension.
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df['day'].map(mapping)
And the sorting is simple:
df.iloc[key.argsort()]
I will provide bellow code to extend Dan’s answer to address the “FURTHER GENERALIZATION” section of the OP’s question. First, a complete example for the simple case (just one variable) based in Dan’s solution:
import pandas as pd
# Create dataframe
df=pd.DataFrame({
'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})
# Calculate the total amount for each day
df_grouped = df.groupby(['day']).sum().amount.reset_index()
# Use Dan's trick to order days names in the table created by groupby
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_grouped['day'].map(mapping)
df_grouped = df_grouped.iloc[key.argsort()]
# Draw the bar chart
df_grouped.plot(kind='bar', x='day')
And now, we use the same ordering technique to order the rows of the pivot table (instead of the rows created by groupby).
import pandas as pd
# Create dataframe
df=pd.DataFrame({
'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})
# Get the amount for each day AND EACH GROUP
df_grouped = df.groupby(['group', 'day']).sum().amount.reset_index()
# Create pivot table to get the total amount for each day and each in the proper format to plot multiple series with pandas
df_pivot = df_grouped.pivot('day','group','amount').reset_index()
# Use Dan's trick to order days names in the table created by PIVOT (not the table created by groupby, in the previous example)
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_pivot['day'].map(mapping)
df_pivot = df_pivot.iloc[key.argsort()]
# Draw the bar chart
df_pivot.plot(kind='bar', x='day')
The result is shown bellow:
I know this response is late, but a simplistic solution to the two cases presented, without use of a dictionary/mappings would be something like I’ve posted below.
Setting ‘day’ as an index enables you to use .loc to select data in a specific order
1) For the two separate plots
df=pd.DataFrame({'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]})
order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']`
df.set_index('day').loc[order].groupby('group').plot(kind='bar')
2) For the pivot example with the dodged plot:
order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']
df.pivot('day','group','amount').loc[order].plot(kind='bar')
note that pivot results in day being in the index already so you can use .loc here again.
Edit: it is best practice to use .loc instead of .ix in these solutions, .ix will be deprecated and can have weird results when column names and indexes are numbers.
What is the Pythonic/pandas way of sorting ‘levels’ within a column in pandas to give a specific ordering of bars in bar plot.
For example, given:
import pandas as pd
df = pd.DataFrame({
'group': ['a', 'a', 'a', 'a', 'a', 'a', 'a',
'b', 'b', 'b', 'b', 'b', 'b', 'b'],
'day': ['Mon', 'Tues', 'Fri', 'Thurs', 'Sat', 'Sun', 'Weds',
'Fri', 'Sun', 'Thurs', 'Sat', 'Weds', 'Mon', 'Tues'],
'amount': [1, 2, 4, 2, 1, 1, 2, 4, 5, 3, 4, 2, 1, 3]})
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')
I can generate the following pair of plots:
The order of the bars follows the row order.
What’s the best way of reordering the data so that the bar charts have bars ordered Mon-Sun?
UPDATE: this rubbish solution works – but it’s far from elegant in the way it uses an extra sorting column:
df2 = pd.DataFrame({
'day': ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun'],
'num': [0, 1, 2, 3, 4, 5, 6]})
df = pd.merge(df, df2, on='day')
df = df.sort_values('num')
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')
FURTHER GENERALISATION:
Is there a solution that also fixes the order of bars in a ‘dodged’ bar plot:
df.pivot('day', 'group', 'amount').plot(kind='bar')
You’ll have to provide a mapping to specify how to order the day names. (If they were stored as proper dates, there would be other ways to do this.)
Updated:
Build the key. You could write out a dictionary explicitly or use something clever like this dict comprehension.
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df['day'].map(mapping)
And the sorting is simple:
df.iloc[key.argsort()]
I will provide bellow code to extend Dan’s answer to address the “FURTHER GENERALIZATION” section of the OP’s question. First, a complete example for the simple case (just one variable) based in Dan’s solution:
import pandas as pd
# Create dataframe
df=pd.DataFrame({
'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})
# Calculate the total amount for each day
df_grouped = df.groupby(['day']).sum().amount.reset_index()
# Use Dan's trick to order days names in the table created by groupby
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_grouped['day'].map(mapping)
df_grouped = df_grouped.iloc[key.argsort()]
# Draw the bar chart
df_grouped.plot(kind='bar', x='day')
And now, we use the same ordering technique to order the rows of the pivot table (instead of the rows created by groupby).
import pandas as pd
# Create dataframe
df=pd.DataFrame({
'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})
# Get the amount for each day AND EACH GROUP
df_grouped = df.groupby(['group', 'day']).sum().amount.reset_index()
# Create pivot table to get the total amount for each day and each in the proper format to plot multiple series with pandas
df_pivot = df_grouped.pivot('day','group','amount').reset_index()
# Use Dan's trick to order days names in the table created by PIVOT (not the table created by groupby, in the previous example)
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_pivot['day'].map(mapping)
df_pivot = df_pivot.iloc[key.argsort()]
# Draw the bar chart
df_pivot.plot(kind='bar', x='day')
The result is shown bellow:
I know this response is late, but a simplistic solution to the two cases presented, without use of a dictionary/mappings would be something like I’ve posted below.
Setting ‘day’ as an index enables you to use .loc to select data in a specific order
1) For the two separate plots
df=pd.DataFrame({'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]})
order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']`
df.set_index('day').loc[order].groupby('group').plot(kind='bar')
2) For the pivot example with the dodged plot:
order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']
df.pivot('day','group','amount').loc[order].plot(kind='bar')
note that pivot results in day being in the index already so you can use .loc here again.
Edit: it is best practice to use .loc instead of .ix in these solutions, .ix will be deprecated and can have weird results when column names and indexes are numbers.