Seaborn plot displot with hue and dual y-scale (twinx)

Question

I am trying to plot the output from the predict of a ML model, there are the classes 1,0 for the Target, and the Score. Due the dataset is not balanced, there are few 1’s.

When I plot a simple displot with the Target in the hue parameter, the plot is useless for describing the 1’s

sns.set_theme()
sns.set_palette(sns.color_palette('rocket', 3))
sns.displot(df, x='Score', hue='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
plt.show()

I want to change the scale for the 1’s in the same plot, with a second y-scale in the right with twinx.

I have tried the following codes that may solve the problem with 2 plots, but I need only one plot. I couldn’t use twinx.

g = sns.displot(df, x='Score', col='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,400)
plt.show()

g = sns.FacetGrid(df, hue='Target')
g = g.map(sns.displot, 'Score', bins=30, linewidth=0, height=3, kde=True, aspect=1.6)

A reproducible example could be with the titanic dataset:

df_ = sns.load_dataset('titanic')
sns.displot(df_, x='fare', hue='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)

g = sns.displot(df_, x='fare', col='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,150)
plt.show()

Asked By: Napoleón Cortés

||

Source

Answer 1

I am not sure but are you looking for this.

import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
df_ = sns.load_dataset('titanic')
sns.histplot(df_[df_['survived']==1]['fare'], bins=30, linewidth=0, kde=True, color='red')
ax2 = plt.twinx()
sns.histplot(df_[df_['survived']==0]['fare'], bins=30, linewidth=0, kde=True, ax=ax2, color='blue')

Answered By: max12525k

Answer 2

To compare the shape of distributions with different numbers of observations, you can normalize them by setting stat="density". By default, this normalizes each distribution using the same denominator, but you can normalize each one independently by setting common_norm=False:

sns.displot(
    titanic, x='fare', hue='survived',
    bins=30, linewidth=0, kde=True,
    stat="density", common_norm=False,
    height=5, aspect=1.6
)

The peak of the two distributions is not at the same y value, but that is a real feature of the data: the population of survivors is spread over a wider range of fares and is less clustered at the lower end. Having two independent y axes and scaling them to equalize the height of each distribution’s peak would be misleading.

Answered By: mwaskom

Answer 3

Displot() no longer has stat and common_norm attributes (which are mentioned in mwaskom’s answer), but similar outputs can be obtained using kdeplot() and histplot() functions.

In kdeplot(), the stat and kde parameters are not required as the function already calculates the density estimates:

sns.kdeplot(
data=titanic, x='fare', hue='survived',
linewidth=0,  common_norm=False, fill=True)
plt.xlim(-50,300)

The histplot() alternative.

sns.histplot(
data=titanic, x='fare', hue='survived',
linewidth=0,  common_norm=False, kde=True)
plt.xlim(0,100)

Worth noting that kde parameter uses kdeplot() in the background and it’s possible to specify your own kde attributes through kde_kws.

Docs: kdeplot, histplot

Answered By: Pengshe

Seaborn plot displot with hue and dual y-scale (twinx)

Question:

Answers: