seaborn bar plot from dataframe with multiple columns and datetime index

Question:

I have dataframe with two columns like this (with date as index) :

enter image description here

my goal is to plot bar with seaborn like this (with excel) :

enter image description here

I followed the discussion from seaborn multiple variables group bar plot

I know that I must use melt, but when I put the following code the result is the index (date) disappear (replaced by number), and the dataframe structure changed like this :

# pd.melt(df, id_vars=['A'], value_vars=['B'])
premier_melt = pd.melt(final_mada_df,id_vars=["Confirmed"],value_vars = ["Recovered"])

enter image description here

how can we solve this kind of problem to plot bar with seaborn correctly


I put the code below following the advice below :

# main dataframe 
  df2
       Recovered Confirmed
3/20/20   0          3
3/21/20   0          0
3/22/20   0          0
3/23/20   0          9
 
df2.stack()

out :

3/20/20  Recovered    0
         Confirmed    3
3/21/20  Recovered    0
         Confirmed    0
3/22/20  Recovered    0
                     ..
5/4/20   Confirmed    0
5/5/20   Recovered    2
         Confirmed    2
5/6/20   Recovered    0
         Confirmed    7
Length: 96, dtype: int64

df2.rename(columns={'level_1':'Status',0:'Values'})

out :

       Recovered Confirmed
3/20/20   0         3
3/21/20   0         0
3/22/20   0         0
3/23/20   0         9
3/24/20   0         5

but when I put the following code , there is an error :

# plot 
ax = sns.barplot(x=df2.index,y='Values',data=df2,hue='Status')

ValueError: Could not interpret input 'Values'

Answers:

  • Use .stack() or .melt to transform the dataframe from a wide to long format, as shown below.

Imports and Sample Data

import pandas as pd
import seaborn as sns
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# optional graph format parameters
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.style.use('ggplot')

# data
np.random.seed(365)
data = {'Confirmed': [np.random.randint(10) for _ in range(25)],
        'date': pd.bdate_range(datetime.today(), freq='d', periods=25).tolist()}

# dataframe
df = pd.DataFrame(data)

# add recovered
df['Recovered'] = df['Confirmed'].div(2)

| date                |   Confirmed |   Recovered |
|:--------------------|------------:|------------:|
| 2020-05-12 00:00:00 |           4 |         2   |
| 2020-05-13 00:00:00 |           1 |         0.5 |
| 2020-05-14 00:00:00 |           5 |         2.5 |
| 2020-05-15 00:00:00 |           1 |         0.5 |
| 2020-05-16 00:00:00 |           9 |         4.5 |

# verify datetime format and set index
df.date = pd.to_datetime(df.date)
df.set_index('date', inplace=True)

Transform DataFrame

Using .stack

  • This transformation is required to get the desired plot from seaborn
df1 = df.stack().reset_index().set_index('date').rename(columns={'level_1': 'Status', 0: 'Values'})

Using .melt

df1 = df.melt(ignore_index=False, var_name='Status', value_name='Values')

Transformation Result

               Status  Values
date                         
2022-06-24  Confirmed     2.0
2022-06-25  Confirmed     4.0
2022-06-26  Confirmed     1.0
2022-06-27  Confirmed     5.0
2022-06-28  Confirmed     2.0

Seaborn plot

  • Formatting the x-axis tick labels needs to use df not df1. As can be seen above, each date repeats, so df1.index.to_series() will produce a list with repeating dates.
ax = sns.barplot(x=df1.index, y='Values', data=df1, hue='Status')

# format the x-axis tick labels uses df, not df1
ax.xaxis.set_major_formatter(plt.FixedFormatter(df.index.to_series().dt.strftime("%Y-%m-%d")))

# alternative use the following to format the labels
# _, labels = plt.xticks()
# labels = [label.get_text()[:10] for label in labels]
# ax.xaxis.set_major_formatter(plt.FixedFormatter(labels))

plt.xticks(rotation=90)
plt.show()

Alternatively df.plot.bar()

  • Produces the same graph as above, without the transformation to df1
  • df has a datetime index, which is recognized as the x-axis, and all of the columns are plotted on the y-axis.
ax = df.plot.bar()
ax.xaxis.set_major_formatter(plt.FixedFormatter(df.index.to_series().dt.strftime("%Y-%m-%d")))
plt.show()

enter image description here

Answered By: Trenton McKinney