How to add error bars on a grouped barplot from a pandas column

Question:

I have a data frame df that has four columns: Candidate, Sample_Set, Values, and Error. The Candidate column has, say, three unique entries: [X, Y, Z] and we have three sample sets, such that Sample_Set has three unique values as well: [1,2,3]. The df would roughly look like this.

import pandas as pd

data = {'Candidate': ['X', 'Y', 'Z', 'X', 'Y', 'Z', 'X', 'Y', 'Z'],
        'Sample_Set': [1, 1, 1, 2, 2, 2, 3, 3, 3],
        'Values': [20, 10, 10, 200, 101, 99, 1999, 998, 1003],
        'Error': [5, 2, 3, 30, 30, 30, 10, 10, 10]}
df = pd.DataFrame(data)

# display(df)
  Candidate  Sample_Set  Values  Error
0         X           1      20      5
1         Y           1      10      2
2         Z           1      10      3
3         X           2     200     30
4         Y           2     101     30
5         Z           2      99     30
6         X           3    1999     10
7         Y           3     998     10
8         Z           3    1003     10

I am using to create a grouped barplot out of this with x="Candidate", y="Values", hue="Sample_Set". All’s good, until I try to add an error bar along the y-axis using the values under the column named Error. I am using the following code.

import seaborn as sns

ax = sns.factorplot(x="Candidate", y="Values", hue="Sample_Set", data=df,
                    size=8, kind="bar")

How do I incorporate the error?

I would appreciate a solution or a more elegant approach on the task.

Asked By: EFL

||

Answers:

You can get close to what you need using pandas plotting functionalities: see this answer

bars = data.groupby("Candidate").plot(kind='bar',x="Sample_Set", y= "Values", yerr=data['Error'])

This does not do exactly what you want, but pretty close. Unfortunately ggplot2 for python currently does not render error bars properly. Personally, I would resort to R ggplot2 in this case:

data <- read.csv("~/repos/tmp/test.csv")
data
library(ggplot2)
ggplot(data, aes(x=Candidate, y=Values, fill=factor(Sample_Set))) + 
  geom_bar(position=position_dodge(), stat="identity") +
  geom_errorbar(aes(ymin=Values-Error, ymax=Values+Error), width=.1, position=position_dodge(.9)) 
Answered By: Dima Lituiev

As @ResMar pointed out in the comments, there seems to be no built-in functionality in seaborn to easily set individual errorbars.

If you rather care about the result than the way to get there, the following (not so elegant) solution might be helpful, which builds on matplotlib.pyplot.bar. The seaborn import is just used to get the same style.

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

def grouped_barplot(df, cat,subcat, val , err):
    u = df[cat].unique()
    x = np.arange(len(u))
    subx = df[subcat].unique()
    offsets = (np.arange(len(subx))-np.arange(len(subx)).mean())/(len(subx)+1.)
    width= np.diff(offsets).mean()
    for i,gr in enumerate(subx):
        dfg = df[df[subcat] == gr]
        plt.bar(x+offsets[i], dfg[val].values, width=width, 
                label="{} {}".format(subcat, gr), yerr=dfg[err].values)
    plt.xlabel(cat)
    plt.ylabel(val)
    plt.xticks(x, u)
    plt.legend()
    plt.show()


cat = "Candidate"
subcat = "Sample_Set"
val = "Values"
err = "Error"

# call the function with df from the question
grouped_barplot(df, cat, subcat, val, err )

enter image description here

Note that by simply inversing the category and subcategory

cat = "Sample_Set"
subcat = "Candidate"

you can get a different grouping:

enter image description here

I suggest extracting the position coordinates from patches attributes, and then plotting the error bars.

ax = sns.barplot(data=df, x="Candidate", y="Values", hue="Sample_Set")
x_coords = [p.get_x() + 0.5*p.get_width() for p in ax.patches]
y_coords = [p.get_height() for p in ax.patches]
ax.errorbar(x=x_coords, y=y_coords, yerr=df["Error"], fmt="none", c= "k")

enter image description here

Answered By: michael
  • plots generate error bars when aggregating data, however this data is already aggregated, and has a specified error column.
  • The easiest solution is to use to create the with pandas.DataFrame.plot and kind='bar'
    • is used by default as the plotting backend, and the plot API has a yerr parameter, which accepts the following:
      • As a DataFrame or dict of errors with column names matching the columns attribute of the plotting DataFrame or matching the name attribute of the Series.
      • As a str indicating which of the columns of plotting DataFrame contain the error values.
      • As raw values (list, tuple, or np.ndarray). Must be the same length as the plotting DataFrame/Series.
  • This can be accomplished by reshaping the dataframe from long form to wide form with pandas.DataFrame.pivot
  • See pandas User Guide: Plotting with error bars
  • Tested in python 3.8.12, pandas 1.3.4, matplotlib 3.4.3
# reshape the dataframe into a wide format for Values
vals = df.pivot(index='Candidate', columns='Sample_Set', values='Values')

# reshape the dataframe into a wide format for Errors
yerr = df.pivot(index='Candidate', columns='Sample_Set', values='Error')

# plot vals with yerr
ax = vals.plot(kind='bar', yerr=yerr, logy=True, rot=0, figsize=(6, 5))
_ = ax.legend(title='Sample Set', bbox_to_anchor=(1, 1.02), loc='upper left')

enter image description here

vals

Sample_Set   1    2     3
Candidate                
X           20  200  1999
Y           10  101   998
Z           10   99  1003

yerr

Sample_Set  1   2   3
Candidate            
X           5  30  10
Y           2  30  10
Z           3  30  10
Answered By: Trenton McKinney