Seaborn plot displot with hue and dual y-scale (twinx)
Question:
I am trying to plot the output from the predict of a ML model, there are the classes 1,0 for the Target, and the Score. Due the dataset is not balanced, there are few 1’s.
When I plot a simple displot with the Target in the hue parameter, the plot is useless for describing the 1’s
sns.set_theme()
sns.set_palette(sns.color_palette('rocket', 3))
sns.displot(df, x='Score', hue='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
plt.show()
I want to change the scale for the 1’s in the same plot, with a second y-scale in the right with twinx.
I have tried the following codes that may solve the problem with 2 plots, but I need only one plot. I couldn’t use twinx.
g = sns.displot(df, x='Score', col='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,400)
plt.show()
g = sns.FacetGrid(df, hue='Target')
g = g.map(sns.displot, 'Score', bins=30, linewidth=0, height=3, kde=True, aspect=1.6)
A reproducible example could be with the titanic dataset:
df_ = sns.load_dataset('titanic')
sns.displot(df_, x='fare', hue='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
g = sns.displot(df_, x='fare', col='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,150)
plt.show()
Answers:
I am not sure but are you looking for this.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
df_ = sns.load_dataset('titanic')
sns.histplot(df_[df_['survived']==1]['fare'], bins=30, linewidth=0, kde=True, color='red')
ax2 = plt.twinx()
sns.histplot(df_[df_['survived']==0]['fare'], bins=30, linewidth=0, kde=True, ax=ax2, color='blue')
To compare the shape of distributions with different numbers of observations, you can normalize them by setting stat="density"
. By default, this normalizes each distribution using the same denominator, but you can normalize each one independently by setting common_norm=False
:
sns.displot(
titanic, x='fare', hue='survived',
bins=30, linewidth=0, kde=True,
stat="density", common_norm=False,
height=5, aspect=1.6
)
The peak of the two distributions is not at the same y value, but that is a real feature of the data: the population of survivors is spread over a wider range of fares and is less clustered at the lower end. Having two independent y axes and scaling them to equalize the height of each distribution’s peak would be misleading.
Displot()
no longer has stat
and common_norm
attributes (which are mentioned in mwaskom’s answer), but similar outputs can be obtained using kdeplot()
and histplot()
functions.
In kdeplot()
, the stat
and kde
parameters are not required as the function already calculates the density estimates:
sns.kdeplot(
data=titanic, x='fare', hue='survived',
linewidth=0, common_norm=False, fill=True)
plt.xlim(-50,300)
The histplot()
alternative.
sns.histplot(
data=titanic, x='fare', hue='survived',
linewidth=0, common_norm=False, kde=True)
plt.xlim(0,100)
Worth noting that kde parameter uses kdeplot() in the background and it’s possible to specify your own kde attributes through kde_kws
.
I am trying to plot the output from the predict of a ML model, there are the classes 1,0 for the Target, and the Score. Due the dataset is not balanced, there are few 1’s.
When I plot a simple displot with the Target in the hue parameter, the plot is useless for describing the 1’s
sns.set_theme()
sns.set_palette(sns.color_palette('rocket', 3))
sns.displot(df, x='Score', hue='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
plt.show()
I want to change the scale for the 1’s in the same plot, with a second y-scale in the right with twinx.
I have tried the following codes that may solve the problem with 2 plots, but I need only one plot. I couldn’t use twinx.
g = sns.displot(df, x='Score', col='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,400)
plt.show()
g = sns.FacetGrid(df, hue='Target')
g = g.map(sns.displot, 'Score', bins=30, linewidth=0, height=3, kde=True, aspect=1.6)
A reproducible example could be with the titanic dataset:
df_ = sns.load_dataset('titanic')
sns.displot(df_, x='fare', hue='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
g = sns.displot(df_, x='fare', col='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,150)
plt.show()
I am not sure but are you looking for this.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
df_ = sns.load_dataset('titanic')
sns.histplot(df_[df_['survived']==1]['fare'], bins=30, linewidth=0, kde=True, color='red')
ax2 = plt.twinx()
sns.histplot(df_[df_['survived']==0]['fare'], bins=30, linewidth=0, kde=True, ax=ax2, color='blue')
To compare the shape of distributions with different numbers of observations, you can normalize them by setting stat="density"
. By default, this normalizes each distribution using the same denominator, but you can normalize each one independently by setting common_norm=False
:
sns.displot(
titanic, x='fare', hue='survived',
bins=30, linewidth=0, kde=True,
stat="density", common_norm=False,
height=5, aspect=1.6
)
The peak of the two distributions is not at the same y value, but that is a real feature of the data: the population of survivors is spread over a wider range of fares and is less clustered at the lower end. Having two independent y axes and scaling them to equalize the height of each distribution’s peak would be misleading.
Displot()
no longer has stat
and common_norm
attributes (which are mentioned in mwaskom’s answer), but similar outputs can be obtained using kdeplot()
and histplot()
functions.
In kdeplot()
, the stat
and kde
parameters are not required as the function already calculates the density estimates:
sns.kdeplot(
data=titanic, x='fare', hue='survived',
linewidth=0, common_norm=False, fill=True)
plt.xlim(-50,300)
The histplot()
alternative.
sns.histplot(
data=titanic, x='fare', hue='survived',
linewidth=0, common_norm=False, kde=True)
plt.xlim(0,100)
Worth noting that kde parameter uses kdeplot() in the background and it’s possible to specify your own kde attributes through kde_kws
.