How to draw a boxplot from a DataFrame with lists in values?

Question:

I have a following structure of the dataframe:

data = [
    [12, [0.1, 0.2, 0.3, 0.4, 0.5]],
    [14, [0.8, 0.7, 0.6, 0.4, 0.2]]
    # .... and so on
]
df = pd.DataFrame(data, columns=['index', 'distribution'])

How to build a boxplot chart(s) where:

  1. each box-and-whisker will show the distribution (using the box/whiskers/outliers) of the distribution column above for each index
  2. each box-and-whisker will aggregate the distributions with the same index (e.g. if the index value is the same, the distribution will be merged)
Asked By: Ribtoks

||

Answers:

You can use pandas’ .explode() to convert the pesky lists into a long form dataframe. Seaborn is by far the easiest sway to create a matplotlib-style boxplot from a dataframe. Seaborn will automatically group values belonging to the same ‘index’.

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

data = [
    [12, [0.1, 0.2, 0.3, 0.4, 0.5]],
    [14, [0.8, 0.7, 0.6, 0.4, 0.2]]
    # .... and so on
]
df = pd.DataFrame(data, columns=['index', 'distribution'])
sns.boxplot(data=df.explode('distribution'), x='index', y='distribution', palette='magma')

seaborn boxplot from lists in columns

Answered By: JohanC