How to make dots in Swarmplot (Seaborn) overlap with each other?
Question:
I have made a swarmplot with seaborn, but I can’t seem to find the option to make the dots overlap with each other.
They overlap with each other, but only at the sides.
I would like them to make overlap everywhere when they would not be able fit, but now they only overlap at the sides.
data = sns.load_dataset('iris')
sns.swarmplot(data=data, y="sepal_length", x="species", edgecolor="black",alpha=.5, s=15,linewidth=1.0)
Answers:
I don’t think it’s possible to let the markers overlap deliberately with swarmplot. Of course smaller markers would not overlap at all, if that is desired.
Else a hacky wordaround is to use the fact that seaborn hardcodes the distance between markers for a specific figure size. Hence when plotting on a huge figure, where no overlap happens, but then making the figure smaller afterwards, overlapp should be pretty high.
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset('iris')
fig, ax = plt.subplots(figsize=(19,4.8))
sns.swarmplot(data=data, y="sepal_length", x="species",
edgecolor="black",alpha=.5, s=15,linewidth=1.0, ax=ax)
fig.set_size_inches(6.4,4.8)
plt.show()
Here you would need to find good values for the figsize
, such that you’re happy with the result.
You could also use a stripplot
instead of a swarmplot
. As far as I know, the whole point of swarmplot
is to have a ouput similar to stripplot
but where the points don’t overlay.
data = sns.load_dataset('iris')
sns.stripplot(data=data, y="sepal_length", x="species", edgecolor="black",alpha=.5, s=15,linewidth=1.0)
In addition, you can control the amount of overlap using the jitter=
keyword
Another workaround to have both clustering according to the distribution (not possible with stripplot) and overlap between items (and thus speed) is to define a custom density_jitter
function:
def density_jitter(values, width=1.0, cluster_factor=1.0):
"""
Add jitter to a 1D array of values, using a kernel density estimate
"""
inds = np.arange(len(values))
np.random.shuffle(inds)
values = values[inds]
N = len(values)
nbins = 100
quant = np.round(nbins * (values - np.min(values)) / (np.max(values) - np.min(values) + 1e-8))
inds = np.argsort(quant + np.random.randn(N) * 1e-6)
layer = 0
last_bin = -1
ys = np.zeros(N)
for ind in inds:
if quant[ind] != last_bin:
layer = 0
ys[ind] = cluster_factor * (np.ceil(layer / 2) * ((layer % 2) * 2 - 1))
layer += 1
last_bin = quant[ind]
ys *= 0.9 * (width / np.max(ys + 1))
return ys
data = sns.load_dataset('iris')
for ind, species in enumerate(data.species.unique()):
ys = density_jitter(data[data.species == species].sepal_length.values, width=0.4, cluster_factor=0.2)
plt.scatter(ind + ys, data[data.species == species].sepal_length.values, alpha=0.3, color=plt.cm.tab10(ind))
plt.xticks(np.arange(3), data.species.unique())
plt.show()
I have made a swarmplot with seaborn, but I can’t seem to find the option to make the dots overlap with each other.
They overlap with each other, but only at the sides.
I would like them to make overlap everywhere when they would not be able fit, but now they only overlap at the sides.
data = sns.load_dataset('iris')
sns.swarmplot(data=data, y="sepal_length", x="species", edgecolor="black",alpha=.5, s=15,linewidth=1.0)
I don’t think it’s possible to let the markers overlap deliberately with swarmplot. Of course smaller markers would not overlap at all, if that is desired.
Else a hacky wordaround is to use the fact that seaborn hardcodes the distance between markers for a specific figure size. Hence when plotting on a huge figure, where no overlap happens, but then making the figure smaller afterwards, overlapp should be pretty high.
import seaborn as sns
import matplotlib.pyplot as plt
data = sns.load_dataset('iris')
fig, ax = plt.subplots(figsize=(19,4.8))
sns.swarmplot(data=data, y="sepal_length", x="species",
edgecolor="black",alpha=.5, s=15,linewidth=1.0, ax=ax)
fig.set_size_inches(6.4,4.8)
plt.show()
Here you would need to find good values for the figsize
, such that you’re happy with the result.
You could also use a stripplot
instead of a swarmplot
. As far as I know, the whole point of swarmplot
is to have a ouput similar to stripplot
but where the points don’t overlay.
data = sns.load_dataset('iris')
sns.stripplot(data=data, y="sepal_length", x="species", edgecolor="black",alpha=.5, s=15,linewidth=1.0)
In addition, you can control the amount of overlap using the jitter=
keyword
Another workaround to have both clustering according to the distribution (not possible with stripplot) and overlap between items (and thus speed) is to define a custom density_jitter
function:
def density_jitter(values, width=1.0, cluster_factor=1.0):
"""
Add jitter to a 1D array of values, using a kernel density estimate
"""
inds = np.arange(len(values))
np.random.shuffle(inds)
values = values[inds]
N = len(values)
nbins = 100
quant = np.round(nbins * (values - np.min(values)) / (np.max(values) - np.min(values) + 1e-8))
inds = np.argsort(quant + np.random.randn(N) * 1e-6)
layer = 0
last_bin = -1
ys = np.zeros(N)
for ind in inds:
if quant[ind] != last_bin:
layer = 0
ys[ind] = cluster_factor * (np.ceil(layer / 2) * ((layer % 2) * 2 - 1))
layer += 1
last_bin = quant[ind]
ys *= 0.9 * (width / np.max(ys + 1))
return ys
data = sns.load_dataset('iris')
for ind, species in enumerate(data.species.unique()):
ys = density_jitter(data[data.species == species].sepal_length.values, width=0.4, cluster_factor=0.2)
plt.scatter(ind + ys, data[data.species == species].sepal_length.values, alpha=0.3, color=plt.cm.tab10(ind))
plt.xticks(np.arange(3), data.species.unique())
plt.show()