How to create a proportional horizontal stacked bar chart with percentages

Question:

I have been struggling to create a horizontal bar chart using python, pandas, and matplotlib.

Ideally I’d like to have percentages on the chart as well.

I merged 2 datasets together to get this:

# dataframe
df = pd.DataFrame({'Moving Violation': [103281, 75376, 66957, 73071, 244090],
                   'Other Violations': [54165, 75619, 48567, 33587, 127639]},
                  index=['asian/pacific islander', 'black', 'hispanic', 'other', 'white'])

                        Moving Violation  Other Violations
asian/pacific islander            103281             54165
black                              75376             75619
hispanic                           66957             48567
other                              73071             33587
white                             244090            127639

I am now looking to create a stacked bar chart that looks something like this:

Bar chart I want it to look like

I am struggling to figure this out. I have used plt.barh(), but it has deemed to not work very well.

Asked By: TheDon

||

Answers:

You can do it through temporary df with percentage proportions of your data as in the example below. Dataframe I created has different data, but is the same type as yours.

# df similar to what you have
df = pd.DataFrame(index=["a", "b", "c"], data={"aa": [1,2,3], "bb": [5,6,7]})

# percentages
df_perc = df.iloc[:, :].apply(lambda x: x / x.sum(), axis=1) * 100

# plot
df_perc.plot(ax=ax, kind="barh", stacked=True)

Output plot

Answered By: the_pr0blem
  • How to create a 100% stacked barplot from a categorical dataframe is very similar, but it requires additional steps, due to the structure of the DataFrame, which aren’t required for this DataFrame.
  • The easiest way to plot horizontal stacked bars is to plot the DataFrame directly, using pandas.DataFrame.plot with kind='barh', and stacked=True.
  • Calculate percent
    • Use .sum with the correct axis=, to create totals for each row.
    • Divide df by totals using .div with the correct axis=. .mul by 100, and .round to 2 decimals places, to clean up the presentation.
    • We do not use .apply because it’s not a fast vectorized operation.
    • df.div(df.sum(axis=1), axis=0).mul(100).round(2) for a single line.
  • Use .bar_label to annotate the stacked bars.
  • Tested in python 3.10, pandas 1.4.3, matplotlib 3.5.1
import pandas as pd

# dataframe
df = pd.DataFrame({'Moving Violation': [103281, 75376, 66957, 73071, 244090],
                   'Other Violations': [54165, 75619, 48567, 33587, 127639]},
                  index=['asian/pacific islander', 'black', 'hispanic', 'other', 'white'])

# get the totals for each row
totals = df.sum(axis=1)

# calculate the percent for each row
percent = df.div(totals, axis=0).mul(100).round(2)

# create the plot
ax = percent.plot(kind='barh', stacked=True, figsize=(9, 5), color=['green', 'gray'], xticks=[])
# move the legend
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=2, frameon=False)

# remove ticks
ax.tick_params(left=False, bottom=False)
# remove all spines
ax.spines[['top', 'bottom', 'left', 'right']].set_visible(False)

# iterate through each container
for c in ax.containers:
    
    # custom label calculates percent and add an empty string so 0 value bars don't have a number
    labels = [f'{w:0.2f}%' if (w := v.get_width()) > 0 else '' for v in c]
    
    # add annotations
    ax.bar_label(c, labels=labels, label_type='center', padding=0.3, color='w')

enter image description here

DataFrame Views

df

                        Moving Violation  Other Violations
asian/pacific islander            103281             54165
black                              75376             75619
hispanic                           66957             48567
other                              73071             33587
white                             244090            127639

totals

asian/pacific islander    157446
black                     150995
hispanic                  115524
other                     106658
white                     371729
dtype: int64

percent

                        Moving Violation  Other Violations
asian/pacific islander             65.60             34.40
black                              49.92             50.08
hispanic                           57.96             42.04
other                              68.51             31.49
white                              65.66             34.34
Answered By: Trenton McKinney