Plotting categorical data with pandas and matplotlib

Question:

I have a data frame with categorical data:

     colour  direction
1    red     up
2    blue    up
3    green   down
4    red     left
5    red     right
6    yellow  down
7    blue    down

I want to generate some graphs, like pie charts and histograms based on the categories. Is it possible without creating dummy numeric variables? Something like

df.plot(kind='hist')
Asked By: Ivan

||

Answers:

You can simply use value_counts on the series:

df['colour'].value_counts().plot(kind='bar')

enter image description here

Answered By: Alexander

like this :

df.groupby('colour').size().plot(kind='bar')
Answered By: steboc

You might find useful mosaic plot from statsmodels. Which can also give statistical highlighting for the variances.

from statsmodels.graphics.mosaicplot import mosaic
plt.rcParams['font.size'] = 16.0
mosaic(df, ['direction', 'colour']);

enter image description here

But beware of the 0 sized cell – they will cause problems with labels.

See this answer for details

Answered By: Primer

You could also use countplot from seaborn. This package builds on pandas to create a high level plotting interface. It gives you good styling and correct axis labels for free.

import pandas as pd
import seaborn as sns
sns.set()

df = pd.DataFrame({'colour': ['red', 'blue', 'green', 'red', 'red', 'yellow', 'blue'],
                   'direction': ['up', 'up', 'down', 'left', 'right', 'down', 'down']})
sns.countplot(df['colour'], color='gray')

enter image description here

It also supports coloring the bars in the right color with a little trick

sns.countplot(df['colour'],
              palette={color: color for color in df['colour'].unique()})

enter image description here

Answered By: Jarno

To plot multiple categorical features as bar charts on the same plot, I would suggest:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {
        "colour": ["red", "blue", "green", "red", "red", "yellow", "blue"],
        "direction": ["up", "up", "down", "left", "right", "down", "down"],
    }
)

categorical_features = ["colour", "direction"]
fig, ax = plt.subplots(1, len(categorical_features))
for i, categorical_feature in enumerate(df[categorical_features]):
    df[categorical_feature].value_counts().plot("bar", ax=ax[i]).set_title(categorical_feature)
fig.show()

enter image description here

Answered By: Roman Orac

You can simply use value_counts with sort option set to False. This will preserve ordering of the categories

df['colour'].value_counts(sort=False).plot.bar(rot=0)

link to image

Answered By: msenior_

Using plotly

import plotly.express as px
px.bar(df["colour"].value_counts())
Answered By: Biman Pal

Roman’s answer is very helpful and correct but in latest version you also need to specify kind as the parameter’s order can change.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {
    "colour": ["red", "blue", "green", "red", "red", "yellow", "blue"],
    "direction": ["up", "up", "down", "left", "right", "down", "down"],
    }
)

categorical_features = ["colour", "direction"]
fig, ax = plt.subplots(1, len(categorical_features))
for i, categorical_feature in enumerate(df[categorical_features]):
    df[categorical_feature].value_counts().plot(kind="bar", ax=ax[i]).set_title(categorical_feature)
fig.show()
Answered By: ahtasham nazeer

Pandas.Series.plot.pie

https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.pie.html

We can do a little better than that without straying from the built-in functionality.

People love to hate on pie charts, but they have the same benefit as a mosaic/tree; they help keep proportion-to-the-whole interpretable.

kwargs = dict(
    startangle = 90,
    colormap   = 'Pastel2',
    fontsize   = 13,
    explode    = (0.1,0.1,0.1),
    figsize    = (60,5),
    autopct    = '%1.1f%%',
    title      = 'Chemotherapy Stratification'
)

df['treatment_chemo'].value_counts().plot.pie(**kwargs)

enter image description here

Answered By: Kermit
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.