Seaborn plot displot with hue and dual y-scale (twinx)

Question:

I am trying to plot the output from the predict of a ML model, there are the classes 1,0 for the Target, and the Score. Due the dataset is not balanced, there are few 1’s.

When I plot a simple displot with the Target in the hue parameter, the plot is useless for describing the 1’s

sns.set_theme()
sns.set_palette(sns.color_palette('rocket', 3))
sns.displot(df, x='Score', hue='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
plt.show()

enter image description here

I want to change the scale for the 1’s in the same plot, with a second y-scale in the right with twinx.

I have tried the following codes that may solve the problem with 2 plots, but I need only one plot. I couldn’t use twinx.

g = sns.displot(df, x='Score', col='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,400)
plt.show()

enter image description here

g = sns.FacetGrid(df, hue='Target')
g = g.map(sns.displot, 'Score', bins=30, linewidth=0, height=3, kde=True, aspect=1.6)

enter image description here

A reproducible example could be with the titanic dataset:

df_ = sns.load_dataset('titanic')
sns.displot(df_, x='fare', hue='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)

enter image description here

g = sns.displot(df_, x='fare', col='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,150)
plt.show()

enter image description here

Asked By: Napoleón Cortés

||

Answers:

I am not sure but are you looking for this.

import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
df_ = sns.load_dataset('titanic')
sns.histplot(df_[df_['survived']==1]['fare'], bins=30, linewidth=0, kde=True, color='red')
ax2 = plt.twinx()
sns.histplot(df_[df_['survived']==0]['fare'], bins=30, linewidth=0, kde=True, ax=ax2, color='blue')

Plot

Answered By: max12525k

To compare the shape of distributions with different numbers of observations, you can normalize them by setting stat="density". By default, this normalizes each distribution using the same denominator, but you can normalize each one independently by setting common_norm=False:

sns.displot(
    titanic, x='fare', hue='survived',
    bins=30, linewidth=0, kde=True,
    stat="density", common_norm=False,
    height=5, aspect=1.6
)

enter image description here

The peak of the two distributions is not at the same y value, but that is a real feature of the data: the population of survivors is spread over a wider range of fares and is less clustered at the lower end. Having two independent y axes and scaling them to equalize the height of each distribution’s peak would be misleading.

Answered By: mwaskom

Displot() no longer has stat and common_norm attributes (which are mentioned in mwaskom’s answer), but similar outputs can be obtained using kdeplot() and histplot() functions.

In kdeplot(), the stat and kde parameters are not required as the function already calculates the density estimates:

sns.kdeplot(
data=titanic, x='fare', hue='survived',
linewidth=0,  common_norm=False, fill=True)
plt.xlim(-50,300)

kdeplot_img

The histplot() alternative.

sns.histplot(
data=titanic, x='fare', hue='survived',
linewidth=0,  common_norm=False, kde=True)
plt.xlim(0,100)

histplot_img

Worth noting that kde parameter uses kdeplot() in the background and it’s possible to specify your own kde attributes through kde_kws.

Docs: kdeplot, histplot

Answered By: Pengshe