How to draw a boxplot from a DataFrame with lists in values?
Question:
I have a following structure of the dataframe:
data = [
[12, [0.1, 0.2, 0.3, 0.4, 0.5]],
[14, [0.8, 0.7, 0.6, 0.4, 0.2]]
# .... and so on
]
df = pd.DataFrame(data, columns=['index', 'distribution'])
How to build a boxplot chart(s) where:
- each box-and-whisker will show the distribution (using the box/whiskers/outliers) of the
distribution
column above for each index
- each box-and-whisker will aggregate the distributions with the same
index
(e.g. if the index
value is the same, the distribution
will be merged)
Answers:
You can use pandas’ .explode()
to convert the pesky lists into a long form dataframe. Seaborn is by far the easiest sway to create a matplotlib-style boxplot from a dataframe. Seaborn will automatically group values belonging to the same ‘index’.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
data = [
[12, [0.1, 0.2, 0.3, 0.4, 0.5]],
[14, [0.8, 0.7, 0.6, 0.4, 0.2]]
# .... and so on
]
df = pd.DataFrame(data, columns=['index', 'distribution'])
sns.boxplot(data=df.explode('distribution'), x='index', y='distribution', palette='magma')
I have a following structure of the dataframe:
data = [
[12, [0.1, 0.2, 0.3, 0.4, 0.5]],
[14, [0.8, 0.7, 0.6, 0.4, 0.2]]
# .... and so on
]
df = pd.DataFrame(data, columns=['index', 'distribution'])
How to build a boxplot chart(s) where:
- each box-and-whisker will show the distribution (using the box/whiskers/outliers) of the
distribution
column above for each index - each box-and-whisker will aggregate the distributions with the same
index
(e.g. if theindex
value is the same, thedistribution
will be merged)
You can use pandas’ .explode()
to convert the pesky lists into a long form dataframe. Seaborn is by far the easiest sway to create a matplotlib-style boxplot from a dataframe. Seaborn will automatically group values belonging to the same ‘index’.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
data = [
[12, [0.1, 0.2, 0.3, 0.4, 0.5]],
[14, [0.8, 0.7, 0.6, 0.4, 0.2]]
# .... and so on
]
df = pd.DataFrame(data, columns=['index', 'distribution'])
sns.boxplot(data=df.explode('distribution'), x='index', y='distribution', palette='magma')