QQ-Plot in Python using Plotnine
Question:
I want to plot an array of values against a theoretical distribution using a QQ-Plot in Python. Ideally, I want to create the plot using the library Plotnine
.
But when I try to create the plot, I’m getting error messages… here’s my code with example data:
from scipy.stats import beta
from plotnine import *
import statsmodels.api as sm
import numpy as np
n = 207
values = -1 + np.random.beta(n/2-1, n/2-1, 100) * 2 # my data
dist = beta(n/2-1, n/2-1, loc = -1, scale = 2) # theoretical distribution
# 1. try:
ggplot(aes(sample = values)) + stat_qq(distribution = dist)
# gives ValueError: Unknown continuous distribution '<scipy.stats._distn_infrastructure.rv_frozen object at 0x0000029755C5C070>'
# 2. try:
params = {'a':n/2-1, 'b':n/2-1, 'loc':-1, 'scale':2}
ggplot(aes(sample = values)) + stat_qq(distribution = 'beta', dparams = params)
# gives TypeError: '>' not supported between instances of 'numpy.ndarray' and 'int'
Does anyone know what I’m doing wrong?
When I try to plot using statsmodels
, it seems to work fine:
sm.qqplot(values, dist, line = '45')
As always, any help is highly appreciated!
Answers:
This is a bug in plotnine
, until it is fixed you can try to pass the arguments as a tuple instead of a dict. However, be careful about the positional matching of the arguments (a, b, loc, scale).
Edit
The bug is fixed in the current development version of plotnine
and you can use a dict to pass the arguments.
I want to plot an array of values against a theoretical distribution using a QQ-Plot in Python. Ideally, I want to create the plot using the library Plotnine
.
But when I try to create the plot, I’m getting error messages… here’s my code with example data:
from scipy.stats import beta
from plotnine import *
import statsmodels.api as sm
import numpy as np
n = 207
values = -1 + np.random.beta(n/2-1, n/2-1, 100) * 2 # my data
dist = beta(n/2-1, n/2-1, loc = -1, scale = 2) # theoretical distribution
# 1. try:
ggplot(aes(sample = values)) + stat_qq(distribution = dist)
# gives ValueError: Unknown continuous distribution '<scipy.stats._distn_infrastructure.rv_frozen object at 0x0000029755C5C070>'
# 2. try:
params = {'a':n/2-1, 'b':n/2-1, 'loc':-1, 'scale':2}
ggplot(aes(sample = values)) + stat_qq(distribution = 'beta', dparams = params)
# gives TypeError: '>' not supported between instances of 'numpy.ndarray' and 'int'
Does anyone know what I’m doing wrong?
When I try to plot using statsmodels
, it seems to work fine:
sm.qqplot(values, dist, line = '45')
As always, any help is highly appreciated!
This is a bug in plotnine
, until it is fixed you can try to pass the arguments as a tuple instead of a dict. However, be careful about the positional matching of the arguments (a, b, loc, scale).
Edit
The bug is fixed in the current development version of plotnine
and you can use a dict to pass the arguments.