How to add error bars on a grouped barplot from a pandas column
Question:
I have a pandas data frame df
that has four columns: Candidate
, Sample_Set
, Values
, and Error
. The Candidate
column has, say, three unique entries: [X, Y, Z]
and we have three sample sets, such that Sample_Set
has three unique values as well: [1,2,3]
. The df would roughly look like this.
import pandas as pd
data = {'Candidate': ['X', 'Y', 'Z', 'X', 'Y', 'Z', 'X', 'Y', 'Z'],
'Sample_Set': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Values': [20, 10, 10, 200, 101, 99, 1999, 998, 1003],
'Error': [5, 2, 3, 30, 30, 30, 10, 10, 10]}
df = pd.DataFrame(data)
# display(df)
Candidate Sample_Set Values Error
0 X 1 20 5
1 Y 1 10 2
2 Z 1 10 3
3 X 2 200 30
4 Y 2 101 30
5 Z 2 99 30
6 X 3 1999 10
7 Y 3 998 10
8 Z 3 1003 10
I am using seaborn to create a grouped barplot out of this with x="Candidate"
, y="Values"
, hue="Sample_Set"
. All’s good, until I try to add an error bar along the y-axis using the values under the column named Error
. I am using the following code.
import seaborn as sns
ax = sns.factorplot(x="Candidate", y="Values", hue="Sample_Set", data=df,
size=8, kind="bar")
How do I incorporate the error?
I would appreciate a solution or a more elegant approach on the task.
Answers:
You can get close to what you need using pandas plotting functionalities: see this answer
bars = data.groupby("Candidate").plot(kind='bar',x="Sample_Set", y= "Values", yerr=data['Error'])
This does not do exactly what you want, but pretty close. Unfortunately ggplot2 for python currently does not render error bars properly. Personally, I would resort to R ggplot2 in this case:
data <- read.csv("~/repos/tmp/test.csv")
data
library(ggplot2)
ggplot(data, aes(x=Candidate, y=Values, fill=factor(Sample_Set))) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Values-Error, ymax=Values+Error), width=.1, position=position_dodge(.9))
As @ResMar pointed out in the comments, there seems to be no built-in functionality in seaborn to easily set individual errorbars.
If you rather care about the result than the way to get there, the following (not so elegant) solution might be helpful, which builds on matplotlib.pyplot.bar
. The seaborn import is just used to get the same style.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
def grouped_barplot(df, cat,subcat, val , err):
u = df[cat].unique()
x = np.arange(len(u))
subx = df[subcat].unique()
offsets = (np.arange(len(subx))-np.arange(len(subx)).mean())/(len(subx)+1.)
width= np.diff(offsets).mean()
for i,gr in enumerate(subx):
dfg = df[df[subcat] == gr]
plt.bar(x+offsets[i], dfg[val].values, width=width,
label="{} {}".format(subcat, gr), yerr=dfg[err].values)
plt.xlabel(cat)
plt.ylabel(val)
plt.xticks(x, u)
plt.legend()
plt.show()
cat = "Candidate"
subcat = "Sample_Set"
val = "Values"
err = "Error"
# call the function with df from the question
grouped_barplot(df, cat, subcat, val, err )
Note that by simply inversing the category and subcategory
cat = "Sample_Set"
subcat = "Candidate"
you can get a different grouping:
I suggest extracting the position coordinates from patches
attributes, and then plotting the error bars.
ax = sns.barplot(data=df, x="Candidate", y="Values", hue="Sample_Set")
x_coords = [p.get_x() + 0.5*p.get_width() for p in ax.patches]
y_coords = [p.get_height() for p in ax.patches]
ax.errorbar(x=x_coords, y=y_coords, yerr=df["Error"], fmt="none", c= "k")
- seaborn plots generate error bars when aggregating data, however this data is already aggregated, and has a specified error column.
- The easiest solution is to use pandas to create the bar-chart with
pandas.DataFrame.plot
and kind='bar'
- matplotlib is used by default as the plotting backend, and the plot API has a
yerr
parameter, which accepts the following:
- As a
DataFrame
or dict
of errors with column names matching the columns
attribute of the plotting DataFrame or matching the name
attribute of the Series.
- As a
str
indicating which of the columns of plotting DataFrame
contain the error values.
- As raw values (
list
, tuple
, or np.ndarray
). Must be the same length as the plotting DataFrame
/Series
.
- This can be accomplished by reshaping the dataframe from long form to wide form with
pandas.DataFrame.pivot
- See pandas User Guide: Plotting with error bars
- Tested in
python 3.8.12
, pandas 1.3.4
, matplotlib 3.4.3
# reshape the dataframe into a wide format for Values
vals = df.pivot(index='Candidate', columns='Sample_Set', values='Values')
# reshape the dataframe into a wide format for Errors
yerr = df.pivot(index='Candidate', columns='Sample_Set', values='Error')
# plot vals with yerr
ax = vals.plot(kind='bar', yerr=yerr, logy=True, rot=0, figsize=(6, 5))
_ = ax.legend(title='Sample Set', bbox_to_anchor=(1, 1.02), loc='upper left')
vals
Sample_Set 1 2 3
Candidate
X 20 200 1999
Y 10 101 998
Z 10 99 1003
yerr
Sample_Set 1 2 3
Candidate
X 5 30 10
Y 2 30 10
Z 3 30 10
I have a pandas data frame df
that has four columns: Candidate
, Sample_Set
, Values
, and Error
. The Candidate
column has, say, three unique entries: [X, Y, Z]
and we have three sample sets, such that Sample_Set
has three unique values as well: [1,2,3]
. The df would roughly look like this.
import pandas as pd
data = {'Candidate': ['X', 'Y', 'Z', 'X', 'Y', 'Z', 'X', 'Y', 'Z'],
'Sample_Set': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Values': [20, 10, 10, 200, 101, 99, 1999, 998, 1003],
'Error': [5, 2, 3, 30, 30, 30, 10, 10, 10]}
df = pd.DataFrame(data)
# display(df)
Candidate Sample_Set Values Error
0 X 1 20 5
1 Y 1 10 2
2 Z 1 10 3
3 X 2 200 30
4 Y 2 101 30
5 Z 2 99 30
6 X 3 1999 10
7 Y 3 998 10
8 Z 3 1003 10
I am using seaborn to create a grouped barplot out of this with x="Candidate"
, y="Values"
, hue="Sample_Set"
. All’s good, until I try to add an error bar along the y-axis using the values under the column named Error
. I am using the following code.
import seaborn as sns
ax = sns.factorplot(x="Candidate", y="Values", hue="Sample_Set", data=df,
size=8, kind="bar")
How do I incorporate the error?
I would appreciate a solution or a more elegant approach on the task.
You can get close to what you need using pandas plotting functionalities: see this answer
bars = data.groupby("Candidate").plot(kind='bar',x="Sample_Set", y= "Values", yerr=data['Error'])
This does not do exactly what you want, but pretty close. Unfortunately ggplot2 for python currently does not render error bars properly. Personally, I would resort to R ggplot2 in this case:
data <- read.csv("~/repos/tmp/test.csv")
data
library(ggplot2)
ggplot(data, aes(x=Candidate, y=Values, fill=factor(Sample_Set))) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Values-Error, ymax=Values+Error), width=.1, position=position_dodge(.9))
As @ResMar pointed out in the comments, there seems to be no built-in functionality in seaborn to easily set individual errorbars.
If you rather care about the result than the way to get there, the following (not so elegant) solution might be helpful, which builds on matplotlib.pyplot.bar
. The seaborn import is just used to get the same style.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
def grouped_barplot(df, cat,subcat, val , err):
u = df[cat].unique()
x = np.arange(len(u))
subx = df[subcat].unique()
offsets = (np.arange(len(subx))-np.arange(len(subx)).mean())/(len(subx)+1.)
width= np.diff(offsets).mean()
for i,gr in enumerate(subx):
dfg = df[df[subcat] == gr]
plt.bar(x+offsets[i], dfg[val].values, width=width,
label="{} {}".format(subcat, gr), yerr=dfg[err].values)
plt.xlabel(cat)
plt.ylabel(val)
plt.xticks(x, u)
plt.legend()
plt.show()
cat = "Candidate"
subcat = "Sample_Set"
val = "Values"
err = "Error"
# call the function with df from the question
grouped_barplot(df, cat, subcat, val, err )
Note that by simply inversing the category and subcategory
cat = "Sample_Set"
subcat = "Candidate"
you can get a different grouping:
I suggest extracting the position coordinates from patches
attributes, and then plotting the error bars.
ax = sns.barplot(data=df, x="Candidate", y="Values", hue="Sample_Set")
x_coords = [p.get_x() + 0.5*p.get_width() for p in ax.patches]
y_coords = [p.get_height() for p in ax.patches]
ax.errorbar(x=x_coords, y=y_coords, yerr=df["Error"], fmt="none", c= "k")
- seaborn plots generate error bars when aggregating data, however this data is already aggregated, and has a specified error column.
- The easiest solution is to use pandas to create the bar-chart with
pandas.DataFrame.plot
andkind='bar'
- matplotlib is used by default as the plotting backend, and the plot API has a
yerr
parameter, which accepts the following:- As a
DataFrame
ordict
of errors with column names matching thecolumns
attribute of the plotting DataFrame or matching thename
attribute of the Series. - As a
str
indicating which of the columns of plottingDataFrame
contain the error values. - As raw values (
list
,tuple
, ornp.ndarray
). Must be the same length as the plottingDataFrame
/Series
.
- As a
- matplotlib is used by default as the plotting backend, and the plot API has a
- This can be accomplished by reshaping the dataframe from long form to wide form with
pandas.DataFrame.pivot
- See pandas User Guide: Plotting with error bars
- Tested in
python 3.8.12
,pandas 1.3.4
,matplotlib 3.4.3
# reshape the dataframe into a wide format for Values
vals = df.pivot(index='Candidate', columns='Sample_Set', values='Values')
# reshape the dataframe into a wide format for Errors
yerr = df.pivot(index='Candidate', columns='Sample_Set', values='Error')
# plot vals with yerr
ax = vals.plot(kind='bar', yerr=yerr, logy=True, rot=0, figsize=(6, 5))
_ = ax.legend(title='Sample Set', bbox_to_anchor=(1, 1.02), loc='upper left')
vals
Sample_Set 1 2 3
Candidate
X 20 200 1999
Y 10 101 998
Z 10 99 1003
yerr
Sample_Set 1 2 3
Candidate
X 5 30 10
Y 2 30 10
Z 3 30 10