How can plot categorical stacked bar over periodic time?

Question:

I’m trying to extract a stacked bar chart over periodic time (5 years):

import pandas as pd
categorical = ["RL","CD(others)","DL","ML","ML","ML","DL","ML","DL","DL"]
year = [2014,2014,2015,2015,2016,2017,2019,2021,2022,2022]
df = pd.DataFrame({'year':year,
                   'keywords':categorical})
df

I tried relevant post1, post2, post3 to resolve the problem:

#solution1:Pivot table
df.pivot_table(index='year',
               columns='keywords',
              # values='paper_count',
               aggfunc='sum')
#df.plot(x='year', y='paper_count', kind='bar')


#solution2: groupby
# reset_index() gives a column for counting after groupby uses year and category
ctdf = (df.reset_index()
          .groupby(['year'], as_index=False)
          .count()
          # rename isn't strictly necessary here; it's just for readability
          .rename(columns={'index':'paper_count'})
       )
ctdf.plot(x='year', y='paper_count', kind='bar')

At the end, I couldn’t figure out how can plot this periodically by counting every 5 yrs:

2000-2005, 2005-2010, 2015-2020, 2020-2025.

expected output:

enter image description here

Asked By: Mario

||

Answers:

I don’t understand the full logic if the provided example is supposed to match the data, but you can use pandas.cut to form bins, then cumsum to get the cumulated sum (remove this if you just want a simple sum):

years = list(range(2000, 2030, 5))
# [2000, 2005, 2010, 2015, 2020, 2025]
labels = [f'{a}-{b}' for a,b in zip(years, years[1:])]
# ['2000-2005', '2005-2010', '2010-2015', '2015-2020', '2020-2025']

(df.assign(year=pd.cut(df['year'], bins=years, labels=labels))
   .groupby(['year', 'keywords'])['year'].count()
   .unstack()
   .plot.bar(stacked=True)
)

With the red line:

years = list(range(2000, 2030, 5))
# [2000, 2005, 2010, 2015, 2020, 2025]
labels = [f'{a}-{b}' for a,b in zip(years, years[1:])]
# ['2000-2005', '2005-2010', '2010-2015', '2015-2020', '2020-2025']

df2 = (df
 .assign(year=pd.cut(df['year'], bins=years, labels=labels))
 .groupby(['year', 'keywords'])['year'].count()
 .unstack()
)

ax = df2.plot.bar(stacked=True)
# adding arbitrary shift (0.1)
df2.sum(axis=1).add(0.1).plot(ax=ax, color='red', marker='s', label='paper count')
ax.legend()

output:

enter image description here

Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.