violinplot not correctly scaling by count

Question:

I am trying to scale my violin plot by count, but the three final violins, which are based on three data points each, are vastly bigger than the first three, which are based on many more.

My code is as follows:

fig = plt.figure(figsize=(20,10))
grid = plt.GridSpec(1, 1, wspace=0.15, hspace=0)

plotol= fig.add_subplot(grid[0,0])
olivine = sns.violinplot(x=olivinedata.Sample, y=olivinedata.FoContent, scale='count', hue=olivinedata.RimCore, order=["85B", "95B", "98", "LZa* (Tranquil)", "LZa* (Banded)", "LZb* ", "LZa", "LZb", "LZc"], ax=plotol)
plotol.set_xticklabels(plotol.get_xticklabels(), 
                          rotation=20, fontsize = 15,
                          horizontalalignment='right')
plotol.set_yticklabels(plotol.get_yticks(), size=15)


plotol.set_xlabel("Sample",size = 24,alpha=0.7)
plotol.set_ylabel("Fo# (mol. %)",size = 24,alpha=0.7)
plt.setp(plotol.get_legend().get_texts(), fontsize='22')
plotol.legend(title="Measurement Type")

I am also getting a warning message

UserWarning: FixedFormatter should only be used together with FixedLocator
if sys.path[0] == ”:

which results from inclusion of the line:

plotol.set_yticklabels(plotol.get_yticks(), size=15)

and I have no idea why. Any help is appreciated!

Violin Plot

Asked By: Saffy

||

Answers:

You probably want scale_hue=False, otherwise the scaling acts per x category.

Here is a comparison of the scale options, with and without scale_hue:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

df1 = pd.DataFrame({'sample': np.repeat([*'ABC'], 20),
                    'hue': np.repeat([*'BBRBRB'], 10),
                    'val': np.random.uniform(10, 20, 60)})
df2 = pd.DataFrame({'sample': np.repeat([*'XYZ'], 3),
                    'hue': np.repeat([*'BBB'], 3),
                    'val': np.random.uniform(10, 20, 9)})
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(24, 8))
for row, scale_hue in zip([0, 1], [True, False]):
    for ax, scale in zip(axes[row, :], ['area', 'count', 'width']):
        sns.violinplot(data=pd.concat([df1, df2]), x='sample', y='val', hue='hue',
                       scale=scale, scale_hue=scale_hue, ax=ax)
        ax.set_title(f"scale='{scale}', scale_hue={scale_hue}", size=16)
plt.tight_layout()
plt.show()

comparison violinplot scale count

Answered By: JohanC