How to draw distribution plot for discrete variables in seaborn
Question:
When I draw displot
for discrete variables, the distribution might not be as what I think. For example.
We can find that there are crevices in the barplot
so that the curve in kdeplot
is “lower” in y axis.
In my work, it was even worse:
I think it may because the “width” or “weight” was not 1 for each bar. But I didn’t find any parameter that can justify it.
Answers:
One way to deal with this problem might be to adjust the “bandwidth” of the KDE (see the documentation for seaborn.kdeplot()
)
n = np.round(np.random.normal(5,2,size=(10000,)))
sns.distplot(n, kde_kws={'bw':1})
EDIT Here is an alternative with a different scale for the bars and the KDE
n = np.round(np.random.normal(5,2,size=(10000,)))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
sns.distplot(n, kde=False, ax=ax1)
sns.distplot(n, hist=False, ax=ax2, kde_kws={'bw':1})
If the problem is that there are some emptry bins in the histogram, it probably makes sense to specify the bins to match the data. In this case, use bins=np.arange(0,16)
to get the bins for all integers in the data.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
import seaborn as sns
n = np.random.randint(0,15,10000)
sns.distplot(n, bins=np.arange(0,16), hist_kws=dict(ec="k"))
plt.show()
It seems sns.distplot (or displot https://seaborn.pydata.org/generated/seaborn.displot.html) is for plotting histograms and no barplots. Both Histogram and KDE (which is an approximation of the probability density function) make sense only with continuous random variables.
So in your case, as you’d like to plot a distribution of a discrete random variable, you must go for a bar plot and plotting the Probability Mass Function (PMF) instead.
import numpy as np
import matplotlib.pyplot as plt
array = np.random.randint(15, size=10000)
unique, counts = np.unique(array, return_counts=True)
freq =counts/10000 # to change into frequency, no count
# plotting the points
plt.bar(unique, freq)
# naming the x axis
plt.xlabel('Value')
# naming the y axis
plt.ylabel('Frequency')
#Title
plt.title("Discrete uniform distribution")
# function to show the plot
plt.show()
When I draw displot
for discrete variables, the distribution might not be as what I think. For example.
We can find that there are crevices in the barplot
so that the curve in kdeplot
is “lower” in y axis.
In my work, it was even worse:
I think it may because the “width” or “weight” was not 1 for each bar. But I didn’t find any parameter that can justify it.
One way to deal with this problem might be to adjust the “bandwidth” of the KDE (see the documentation for seaborn.kdeplot()
)
n = np.round(np.random.normal(5,2,size=(10000,)))
sns.distplot(n, kde_kws={'bw':1})
EDIT Here is an alternative with a different scale for the bars and the KDE
n = np.round(np.random.normal(5,2,size=(10000,)))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
sns.distplot(n, kde=False, ax=ax1)
sns.distplot(n, hist=False, ax=ax2, kde_kws={'bw':1})
If the problem is that there are some emptry bins in the histogram, it probably makes sense to specify the bins to match the data. In this case, use bins=np.arange(0,16)
to get the bins for all integers in the data.
import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
import seaborn as sns
n = np.random.randint(0,15,10000)
sns.distplot(n, bins=np.arange(0,16), hist_kws=dict(ec="k"))
plt.show()
It seems sns.distplot (or displot https://seaborn.pydata.org/generated/seaborn.displot.html) is for plotting histograms and no barplots. Both Histogram and KDE (which is an approximation of the probability density function) make sense only with continuous random variables.
So in your case, as you’d like to plot a distribution of a discrete random variable, you must go for a bar plot and plotting the Probability Mass Function (PMF) instead.
import numpy as np
import matplotlib.pyplot as plt
array = np.random.randint(15, size=10000)
unique, counts = np.unique(array, return_counts=True)
freq =counts/10000 # to change into frequency, no count
# plotting the points
plt.bar(unique, freq)
# naming the x axis
plt.xlabel('Value')
# naming the y axis
plt.ylabel('Frequency')
#Title
plt.title("Discrete uniform distribution")
# function to show the plot
plt.show()