How to draw distribution plot for discrete variables in seaborn

Question:

When I draw displot for discrete variables, the distribution might not be as what I think. For example.

enter image description here
We can find that there are crevices in the barplot so that the curve in kdeplot is “lower” in y axis.

In my work, it was even worse:
enter image description here

I think it may because the “width” or “weight” was not 1 for each bar. But I didn’t find any parameter that can justify it.

I’d like to draw such curve (It should be more smooth)
enter image description here

Asked By: Zealseeker

||

Answers:

One way to deal with this problem might be to adjust the “bandwidth” of the KDE (see the documentation for seaborn.kdeplot())

n = np.round(np.random.normal(5,2,size=(10000,)))
sns.distplot(n, kde_kws={'bw':1})

enter image description here

EDIT Here is an alternative with a different scale for the bars and the KDE

n = np.round(np.random.normal(5,2,size=(10000,)))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

sns.distplot(n, kde=False, ax=ax1)
sns.distplot(n, hist=False, ax=ax2, kde_kws={'bw':1})

enter image description here

Answered By: Diziet Asahi

If the problem is that there are some emptry bins in the histogram, it probably makes sense to specify the bins to match the data. In this case, use bins=np.arange(0,16) to get the bins for all integers in the data.

import numpy as np; np.random.seed(1)
import matplotlib.pyplot as plt
import seaborn as sns

n = np.random.randint(0,15,10000)
sns.distplot(n, bins=np.arange(0,16), hist_kws=dict(ec="k"))

plt.show()

enter image description here

It seems sns.distplot (or displot https://seaborn.pydata.org/generated/seaborn.displot.html) is for plotting histograms and no barplots. Both Histogram and KDE (which is an approximation of the probability density function) make sense only with continuous random variables.
So in your case, as you’d like to plot a distribution of a discrete random variable, you must go for a bar plot and plotting the Probability Mass Function (PMF) instead.

import numpy as np
import matplotlib.pyplot as plt

array = np.random.randint(15, size=10000)
unique, counts = np.unique(array, return_counts=True)
freq =counts/10000 # to change into frequency, no count

# plotting the points 
plt.bar(unique, freq)

# naming the x axis
plt.xlabel('Value')
# naming the y axis
plt.ylabel('Frequency')

#Title
plt.title("Discrete uniform distribution")

# function to show the plot
plt.show()
Answered By: Quentin VOITURON