Set confidence intervals for error bars plot in matplotlib

Question:

I have this dataset:

mydf = pd.DataFrame({'Feature':['Pysch','Physio'],'log_or':[0.3126,0.2022],
'se':[0.0712,0.0568], 'conf_low':[0.1729,0.0907], 'conf_high':[0.4522, 0.3136]})
mydf = mydf.sort_values(by='log_or')
mydf
    Feature log_or  se      conf_low  conf_high
1   Physio  0.2022  0.0568  0.0907    0.3136
0   Pysch   0.3126  0.0712  0.1729    0.4522

And I want to create an error bar plot using my calculated confidence intervals in con_low and conf_high

I tried this at the beginning but I can see that the intervals don’t cover my calculated confidence intervals:

plt.errorbar(mydf['log_or'], mydf['Feature'], 
xerr=mydf['se'], marker='s', mfc='Tomato')
plt.show()

enter image description here

You can see that, for example, in the Physio variable the error bar goes from 0.14 to 0.26 in the image approximately, but my tabulated confidence intervals go from 0.091 to 0.316.

So I tried to set up my custom intervals, with this:

lowr = mydf['conf_low'].to_numpy()
uppr = mydf['conf_high'].to_numpy()
intervals = [lowr, uppr]
plt.errorbar(mydf['log_or'], mydf['Feature'], xerr=intervals, marker='s', mfc='Tomato')
plt.show()

enter image description here

Now my variable Physio interval goes from 0.1 to 0.5 approx, which is wrong. Now, what I am doing wrong? How can I use my custom intervals to this plot?

Asked By: Alexis

||

Answers:

I think you are misunderstanding what the values passed to xerr are meant to represent. Have a look at the plt.errorbar documentation (sub xerr, yerr).

From your first attempt: xerr=mydf['se'] will be used as follows:

shape(N,): Symmetric +/-values for each data point.

From your second attempt, xerr=intervals will be used as follows:

shape(2, N): Separate – and + values for each bar. First row contains the lower errors, the second row contains the upper errors.

So, the values you are passing here are used to measure the length of the error (+/- for each data point). However, your values in mydf.conf_low and mydf.conf_high do not represent length, they are simply x-values. As you mention for Physio:

my tabulated confidence intervals go from 0.091 to 0.316.

The solution then is to calculate the length on both sides and pass those values to xerr. Like so:

import pandas as pd
import matplotlib.pyplot as plt

mydf = pd.DataFrame({'Feature':['Pysch','Physio'],'log_or':[0.3126,0.2022],
'se':[0.0712,0.0568], 'conf_low':[0.1729,0.0907], 'conf_high':[0.4522, 0.3136]})
mydf = mydf.sort_values(by='log_or')
mydf

plt.errorbar(mydf['log_or'], mydf['Feature'], 
xerr=((mydf.log_or - mydf.conf_low),(mydf.conf_high-mydf.log_or)), marker='s', mfc='Tomato')
plt.show()

Result:

errorbars

Answered By: ouroboros1
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.