Change the width of merged bins in Matplotlib and Seaborn

Question:

I have a table of grades and I want all of the bins to be of the same width

i want the bins to be in the range of [0,56,60,65,70,80,85,90,95,100]
when the first bin is from 0-56 then 56-60 … with the same width

sns.set_style('darkgrid') 
newBins = [0,56,60,65,70,80,85,90,95,100]

sns.displot(data= scores  , bins=newBins)

plt.xlabel('grade')
plt.xlim(0,100)
plt.xticks(newBins);

histogram plot with unbalanced bins

Expected output

enter image description here

how I can balance the width of the bins?

Asked By: itay k

||

Answers:

You assigning a list to the bins argument of sns.distplot(). This specifies the edges of bins. Since these edges are not spaced evenly, the widths of bins vary.

I think that you may want to use a bar plot (sbs.barplot()) and not a histogram. You would need to compute how many data points are in each bin, and then plot bars without the information what range of values each bar represents. Something like this:

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('darkgrid') 
import numpy as np

# sample data
data = np.random.randint(0, 100, 200)
newBins = [0,56,60,65,70,80,85,90,95,100]

# compute bar heights
hist, _ = np.histogram(data, bins=newBins)
# plot a bar diagram
sns.barplot(x = list(range(len(hist))), y = hist)
plt.show()

It gives:

enter image description here

Answered By: bb1

just change the list of values that are you using as binds:

newBins = numpy.arange(0, 100, 1)
Answered By: Jonathan Pacheco

You need to cheat a bit. Define you own bins and name the bins with a linear range. Here is an example:

s = pd.Series(np.random.randint(100, size=100000))
bins = [-0.1, 50, 75, 95, 101]
s2 = pd.cut(s, bins=bins, labels=range(len(bins)-1))
ax = s2.astype(int).plot.hist(bins=len(bins)-
1)
ax.set_xticks(np.linspace(0, len(bins)-2, len(bins)))
ax.set_xticklabels(bins)

Output:

enter image description here

Old answer:

Why don’t you let seaborn pick the bins for you:

sns.displot(data=scores, bins='auto')

Or set the number of bins that you want:

sns.displot(data=scores, bins=10)

They will be evenly distributed

Answered By: mozway

 You can use bin parameter from histplots but to get exact answer you have to use pd.cut() to creating your own bins.

np.random.seed(101)
df = pd.DataFrame({'scores':pd.Series(np.random.randint(100,size=175)),
                     'bins_created':pd.cut(scores,bins=[0,55,60,65,70,75,80,85,90,95,100])})


new_data = df['bins_created'].value_counts()


plt.figure(figsize=(10,5),dpi=100)
plots = sns.barplot(x=new_data.index,y=new_data.values)
plt.xlabel('grades')
plt.ylabel('counts')


for bar in plots.patches:
    plots.annotate(format(bar.get_height(), '.2f'),
                   (bar.get_x() + bar.get_width() / 2,
                    bar.get_height()), ha='center', va='center',
                   size=10, xytext=(0,5),
                   textcoords='offset points')

    
plt.show()

histplot output as example

Answered By: TONY
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.