Include both % and N as bar labels

Question:

I have created a bar plot with percentages. However, since there’s possibility of attrition I would like to include N, the number of observations or sample size (in brackets) as part of the bar labels. In other words, N should be the count of baseline and endline values.

import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
import seaborn as sns
import pandas as pd
import numpy as np

data = {
'id': [1, 1, 2, 3, 3, 4, 4, 5, 6, 6, 7, 7, 8, 8, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15],
'survey': ['baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'baseline', 'endline', 'baseline', 'endline', 'baseline', 'endline', ],
'growth': [1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0]
}

df = pd.DataFrame(data)

sns.set_style('white')
ax = sns.barplot(data = df,
                 x = 'survey', y = 'growth',
                 estimator = lambda x: np.sum(x) / np.size(x) * 100, ci = None,
                 color = 'cornflowerblue')
ax.bar_label(ax.containers[0], fmt = '%.1f %%', fontsize = 20)

sns.despine(ax = ax, left = True)
ax.grid(True, axis = 'y')
ax.yaxis.set_major_formatter(PercentFormatter(100))
ax.set_xlabel('')
ax.set_ylabel('')
plt.tight_layout()
plt.show()

I will appreciate guidance on how to achieve this
Thanks in advance!

Asked By: Stephen Okiya

||

Answers:

One approach could be as follows.

  • First, use df.groupby on column survey and calculate the number of observations for each survey by applying count to column growth.
  • Chain Series.to_numpy to convert the resulting series to an array.
N = df.groupby('survey', sort=False)['growth'].count().to_numpy()
# array([15, 12], dtype=int64), i.e. N for baseline == 15, for endline == 12
  • Within ax.bar_labels, we can now create custom labels, combining the values from the array N with the values that are already stored in ax.containers[0].datavalues, which is itself a similar array (i.e. array([60. , 41.66666667]). So, we can do something as follows:
# Turn `N` into italic type
N_it = '$it{N}$'

# Use list comprehension with `zip` to loop through `datavalues` and `N`
# simultaneously and use `f-strings` to produce the custom strings
labels=[f'{np.round(perc,1)}% ({N_it} = {n})' 
        for perc, n in zip(ax.containers[0].datavalues, N)]

# Pass `labels` to `ax.bar_label` 
ax.bar_label(ax.containers[0], labels = labels, fontsize = 20)

So, we can include this in the code snippet that you have provided as follows:

ax = sns.barplot(...)

# ---- start
# ax.bar_label(ax.containers[0], fmt = '%.1f %%', fontsize = 20)

N = df.groupby('survey', sort=False)['growth'].count().to_numpy()
N_it = '$it{N}$'
labels=[f'{np.round(perc,1)}% ({N_it} = {n})' 
        for perc, n in zip(ax.containers[0].datavalues, N)]

ax.bar_label(ax.containers[0], labels = labels, fontsize = 20)
# ---- end

sns.despine(ax = ax, left = True)

Result

result custom labels


On the italic type, see this SO post.

Answered By: ouroboros1
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.