How to create a proportional horizontal stacked bar chart with percentages
Question:
I have been struggling to create a horizontal bar chart using python, pandas, and matplotlib.
Ideally I’d like to have percentages on the chart as well.
I merged 2 datasets together to get this:
# dataframe
df = pd.DataFrame({'Moving Violation': [103281, 75376, 66957, 73071, 244090],
'Other Violations': [54165, 75619, 48567, 33587, 127639]},
index=['asian/pacific islander', 'black', 'hispanic', 'other', 'white'])
Moving Violation Other Violations
asian/pacific islander 103281 54165
black 75376 75619
hispanic 66957 48567
other 73071 33587
white 244090 127639
I am now looking to create a stacked bar chart that looks something like this:
I am struggling to figure this out. I have used plt.barh()
, but it has deemed to not work very well.
Answers:
You can do it through temporary df with percentage proportions of your data as in the example below. Dataframe I created has different data, but is the same type as yours.
# df similar to what you have
df = pd.DataFrame(index=["a", "b", "c"], data={"aa": [1,2,3], "bb": [5,6,7]})
# percentages
df_perc = df.iloc[:, :].apply(lambda x: x / x.sum(), axis=1) * 100
# plot
df_perc.plot(ax=ax, kind="barh", stacked=True)
- How to create a 100% stacked barplot from a categorical dataframe is very similar, but it requires additional steps, due to the structure of the DataFrame, which aren’t required for this DataFrame.
- The easiest way to plot horizontal stacked bars is to plot the DataFrame directly, using
pandas.DataFrame.plot
with kind='barh'
, and stacked=True
.
- Calculate
percent
- Use
.sum
with the correct axis=
, to create totals
for each row.
- Divide
df
by totals
using .div
with the correct axis=
. .mul
by 100, and .round
to 2 decimals places, to clean up the presentation.
- We do not use
.apply
because it’s not a fast vectorized operation.
df.div(df.sum(axis=1), axis=0).mul(100).round(2)
for a single line.
- Use
.bar_label
to annotate the stacked bars.
- See How to add value labels on a bar chart for a thorough explanation and more examples with this method, which is available with
matplotlib >= v3.4.0
.
- Tested in
python 3.10
, pandas 1.4.3
, matplotlib 3.5.1
import pandas as pd
# dataframe
df = pd.DataFrame({'Moving Violation': [103281, 75376, 66957, 73071, 244090],
'Other Violations': [54165, 75619, 48567, 33587, 127639]},
index=['asian/pacific islander', 'black', 'hispanic', 'other', 'white'])
# get the totals for each row
totals = df.sum(axis=1)
# calculate the percent for each row
percent = df.div(totals, axis=0).mul(100).round(2)
# create the plot
ax = percent.plot(kind='barh', stacked=True, figsize=(9, 5), color=['green', 'gray'], xticks=[])
# move the legend
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=2, frameon=False)
# remove ticks
ax.tick_params(left=False, bottom=False)
# remove all spines
ax.spines[['top', 'bottom', 'left', 'right']].set_visible(False)
# iterate through each container
for c in ax.containers:
# custom label calculates percent and add an empty string so 0 value bars don't have a number
labels = [f'{w:0.2f}%' if (w := v.get_width()) > 0 else '' for v in c]
# add annotations
ax.bar_label(c, labels=labels, label_type='center', padding=0.3, color='w')
DataFrame Views
df
Moving Violation Other Violations
asian/pacific islander 103281 54165
black 75376 75619
hispanic 66957 48567
other 73071 33587
white 244090 127639
totals
asian/pacific islander 157446
black 150995
hispanic 115524
other 106658
white 371729
dtype: int64
percent
Moving Violation Other Violations
asian/pacific islander 65.60 34.40
black 49.92 50.08
hispanic 57.96 42.04
other 68.51 31.49
white 65.66 34.34
I have been struggling to create a horizontal bar chart using python, pandas, and matplotlib.
Ideally I’d like to have percentages on the chart as well.
I merged 2 datasets together to get this:
# dataframe
df = pd.DataFrame({'Moving Violation': [103281, 75376, 66957, 73071, 244090],
'Other Violations': [54165, 75619, 48567, 33587, 127639]},
index=['asian/pacific islander', 'black', 'hispanic', 'other', 'white'])
Moving Violation Other Violations
asian/pacific islander 103281 54165
black 75376 75619
hispanic 66957 48567
other 73071 33587
white 244090 127639
I am now looking to create a stacked bar chart that looks something like this:
I am struggling to figure this out. I have used plt.barh()
, but it has deemed to not work very well.
You can do it through temporary df with percentage proportions of your data as in the example below. Dataframe I created has different data, but is the same type as yours.
# df similar to what you have
df = pd.DataFrame(index=["a", "b", "c"], data={"aa": [1,2,3], "bb": [5,6,7]})
# percentages
df_perc = df.iloc[:, :].apply(lambda x: x / x.sum(), axis=1) * 100
# plot
df_perc.plot(ax=ax, kind="barh", stacked=True)
- How to create a 100% stacked barplot from a categorical dataframe is very similar, but it requires additional steps, due to the structure of the DataFrame, which aren’t required for this DataFrame.
- The easiest way to plot horizontal stacked bars is to plot the DataFrame directly, using
pandas.DataFrame.plot
withkind='barh'
, andstacked=True
. - Calculate
percent
- Use
.sum
with the correctaxis=
, to createtotals
for each row. - Divide
df
bytotals
using.div
with the correctaxis=
..mul
by 100, and.round
to 2 decimals places, to clean up the presentation. - We do not use
.apply
because it’s not a fast vectorized operation. df.div(df.sum(axis=1), axis=0).mul(100).round(2)
for a single line.
- Use
- Use
.bar_label
to annotate the stacked bars.- See How to add value labels on a bar chart for a thorough explanation and more examples with this method, which is available with
matplotlib >= v3.4.0
.
- See How to add value labels on a bar chart for a thorough explanation and more examples with this method, which is available with
- Tested in
python 3.10
,pandas 1.4.3
,matplotlib 3.5.1
import pandas as pd
# dataframe
df = pd.DataFrame({'Moving Violation': [103281, 75376, 66957, 73071, 244090],
'Other Violations': [54165, 75619, 48567, 33587, 127639]},
index=['asian/pacific islander', 'black', 'hispanic', 'other', 'white'])
# get the totals for each row
totals = df.sum(axis=1)
# calculate the percent for each row
percent = df.div(totals, axis=0).mul(100).round(2)
# create the plot
ax = percent.plot(kind='barh', stacked=True, figsize=(9, 5), color=['green', 'gray'], xticks=[])
# move the legend
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=2, frameon=False)
# remove ticks
ax.tick_params(left=False, bottom=False)
# remove all spines
ax.spines[['top', 'bottom', 'left', 'right']].set_visible(False)
# iterate through each container
for c in ax.containers:
# custom label calculates percent and add an empty string so 0 value bars don't have a number
labels = [f'{w:0.2f}%' if (w := v.get_width()) > 0 else '' for v in c]
# add annotations
ax.bar_label(c, labels=labels, label_type='center', padding=0.3, color='w')
DataFrame Views
df
Moving Violation Other Violations
asian/pacific islander 103281 54165
black 75376 75619
hispanic 66957 48567
other 73071 33587
white 244090 127639
totals
asian/pacific islander 157446
black 150995
hispanic 115524
other 106658
white 371729
dtype: int64
percent
Moving Violation Other Violations
asian/pacific islander 65.60 34.40
black 49.92 50.08
hispanic 57.96 42.04
other 68.51 31.49
white 65.66 34.34