Boxplot of Multiple Columns of a Pandas Dataframe on the Same Figure (seaborn)

Question:

I feel I am probably not thinking of something obvious. I want to put in the same figure, the box plot of every column of a dataframe, where on the x-axis I have the columns’ names. In the seaborn.boxplot() this would be equal to groupby by every column.

In pandas I would do

df = pd.DataFrame(data = np.random.random(size=(4,4)), columns = ['A','B','C','D'])
df.boxplot()

which yields

enter image description here

Now I would like to get the same thing in seaborn. But when I try sns.boxplot(df), I get only one grouped boxplot. How do I reproduce the same figure in seaborn?

Asked By: Duccio Piovani

||

Answers:

The seaborn equivalent of

df.boxplot()

is

sns.boxplot(x="variable", y="value", data=pd.melt(df))

or just

sns.boxplot(data=df)

which will plot any column of numeric values, without converting the DataFrame from a wide to long format, using seaborn v0.11.1. This will create a single figure, with a separate boxplot for each column.

Complete example with melt:

import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame(data = np.random.random(size=(4,4)), columns = ['A','B','C','D'])

sns.boxplot(x="variable", y="value", data=pd.melt(df))

plt.show()

enter image description here

This works because pd.melt converts a wide-form dataframe

          A         B         C         D
0  0.374540  0.950714  0.731994  0.598658
1  0.156019  0.155995  0.058084  0.866176
2  0.601115  0.708073  0.020584  0.969910
3  0.832443  0.212339  0.181825  0.183405

to long-form

   variable     value
0         A  0.374540
1         A  0.156019
2         A  0.601115
3         A  0.832443
4         B  0.950714
5         B  0.155995
6         B  0.708073
7         B  0.212339
8         C  0.731994
9         C  0.058084
10        C  0.020584
11        C  0.181825
12        D  0.598658
13        D  0.866176
14        D  0.969910
15        D  0.183405

You could use the built-in pandas method df.plot(kind=’box’) as suggested in this question.
I realize this answer will not help you if you have to use seaborn, but it may be useful for people with simpler requirements.

import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(data = np.random.random(size=(4,4)), columns = ['A','B','C','D'])

df.plot(kind='box')
plt.show()
Answered By: RafaP
plt.boxplot([df1,df2],   boxprops=dict(color='red'), labels=['title 1','title 2'])
Answered By: DataYoda

The rest of the answers are great and should work well for most use-cases.

But if someone has the same problem as I have where the range of values is very large for one column (possibly a different scale) and you are not able to see anything else for other columns you can do the following: utilize subplots in order to create multiple y-axes within the figure.

# Store the list of columns
columns_to_plot = ['A', 'B', 'C', 'D']

# Create the figure and two subplots
fig, axes = plt.subplots(ncols=len(columns_to_plot))

# Create the boxplot with Seaborn
for column, axis in zip(columns_to_plot, axes):
        sns.boxplot(data=df[column], ax=axis) 
        axis.set_title(column)
        # axis.set(xticklabels=[], xticks=[], ylabel=column)

# Show the plot
plt.tight_layout()
plt.show()

I have also added a commented out line for removing the redundant xticks and their labels, which looked really annoying to me, as well as setting the ylabel name.

Answered By: MattSt
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.