Horizontal barplot with offset in seaborn

Question:

My dataset is like this, where the data points in each row or column are pandas objects.
enter image description here

Here is the dataset:
https://github.com/aebk2015/multipleboxplot.git

I want to have bar plots for each of the columns "Location" (P1 -P14) for each categories (92A11, 92B11, 82B11); something like this:
enter image description here

I have tried something like this and i can have a bar plots for each individual Pi (i=1…14) but not only is it a laborious, it does not look what I want:

fig, ax = plt.subplots(2, 3, figsize=(8,2))

sns.stripplot(data=df.loc[7]['92A11'].split(','), dodge=True, linewidth=1, ax=ax[0,0], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['92A11'].split(','),  ax=ax[0,0], color='orange', orient='h')

sns.stripplot(data=df.loc[7]['92B11'].split(','), dodge=True, linewidth=1, ax=ax[0,1], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['92B11'].split(','),  ax=ax[0,1], color='orange', orient='h')

sns.stripplot(data=df.loc[7]['82B11'].split(','), dodge=True, linewidth=1, ax=ax[0,2], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['82B11'].split(','),  ax=ax[0,2], color='orange', orient='h')

sns.stripplot(data=df.loc[6]['92A11'].split(','), dodge=True, linewidth=1, ax=ax[1,0], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['92A11'].split(','),  ax=ax[1,0], color='orange', orient='h')

sns.stripplot(data=df.loc[6]['92B11'].split(','), dodge=True, linewidth=1, ax=ax[1,1], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['92B11'].split(','),  ax=ax[1,1], color='orange', orient='h')

sns.stripplot(data=df.loc[6]['82B11'].split(','), dodge=True, linewidth=1, ax=ax[1,2], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['82B11'].split(','),  ax=ax[1,2], color='orange', orient='h')

ax[0,0].set_xlim(0,200)
ax[0,1].set_xlim(0,200)
ax[0,2].set_xlim(0,200)
ax[1,0].set_xlim(0,200)
ax[1,1].set_xlim(0,200)
ax[1,2].set_xlim(0,200)

ax[1,0].set_xlabel('92A11')
ax[1,1].set_xlabel('92A11')
ax[1,2].set_xlabel('92A11')

ax[0,0].set_ylabel('P8')
ax[1,0].set_ylabel('P7')
fig.tight_layout()

enter image description here

Asked By: A.E

||

Answers:

Seaborn works easiest with its data in long form. pd.melt can convert the 3 individual columns into a variable-name and value column. pd.explode can convert lists in cells into multiple rows. As the current entries are strings, they can be converted to lists of numbers before calling .explode().

The following code supposes you want box plots, not bars. It also supposes that an entry with only '0' is supposed to stay empty.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('raw_data.csv')
df = df.rename(columns={df.columns[0]: 'Location'}) # suppose the first column is named 'Location'

df_melted = df.melt(id_vars='Location', var_name='Dataset', value_name='Datapoint')
df_melted['Datapoint'] = df_melted['Datapoint'].map(
    lambda s: [np.nan] if s == '0' else [float(v) for v in s.split(',')])
df_exploded = df_melted.explode(column='Datapoint', ignore_index=True)
df_exploded['Datapoint'] = pd.to_numeric(df_exploded['Datapoint']) # make sure the column is numeric

g = sns.catplot(data=df_exploded, col='Dataset', kind='box', x='Datapoint', y='Location')
g.set(xlabel='', ylabel='') # remove x and y label

sns.catplot from wide dataframe

PS: df_merged looks like:

   Location Dataset                                          Datapoint
0        P1   92A11                                              [nan]
1        P2   92A11                                       [10.0, 15.0]
2        P3   92A11  [100.0, 40.0, 60.0, 50.0, 70.0, 50.0, 30.0, 40...
3        P4   92A11  [55.0, 40.0, 20.0, 15.0, 15.0, 15.0, 15.0, 10....
4        P5   92A11                                              [nan]
...

df_exploded has each value inside the list into a separate row:

   Location Dataset Datapoint
0        P1   92A11       NaN
1        P2   92A11      10.0
1        P2   92A11      15.0
2        P3   92A11     100.0
2        P3   92A11      40.0
...
Answered By: JohanC
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.