Python multiple lines lineplot, unable to access correct data from dataframe

Question:

I have a dataset with multiple columns. From these columns, I want to access YEAR and EVENT_TYPE.

Year

1          1997
2          1997
3          1997
4          1997
           ...
1673110    2023
1673111    2023
1673112    2023
1673113    2023
1673114    2023
Event_type

1          Violence against civilians
2          Violence against civilians
3          Violence against civilians
4          Violence against civilians
                      ...
1673110                      Protests
1673111                      Protests
1673112                      Protests
1673113                      Protests
1673114                         Riots

I want to plot a line graph where x-axis has years and y-axis has number of times an event has occurred that year. Each line represents an EVENT_TYPE.

How can I do this?

Asked By: Malhar

||

Answers:

The code uses a minimal set of data to show how to form the data using groupby and then for each type of event plot a line using matplotlib. This very basic plot could be refined to suit your needs.

import pandas as pd
import matplotlib.pyplot as plt

year = [1998, 1998, 1998, 1998, 2023, 2023, 2023, 2023]

event = ['violence', 'violence', 'protests', 'riots', 'riots', 'riots', 'protests', 'violence']

df = pd.DataFrame({'year': year, 'event': event})

df2 = df.groupby(['event', 'year'])['event'].count().to_frame('count')

#plot a line for each type of event
for event, new_df in df2.groupby(level = 0):
        plt.plot(new_df.index.get_level_values('year'), new_df['count'], label = event)
    
plt.xlabel('Year')
plt.ylim(0)
plt.legend()
plt.show()

the data after grouping becomes:

               count
event    year       
protests 1998      1
         2023      1
riots    1998      1
         2023      2
violence 1998      2
         2023      1
Answered By: user19077881