How to avoid graphing duplicate rows in Pandas .plot()

Question:

I have a Pandas DataFrame that looks like this with many entries for all 50 US states:

State Name School District Schools Per District
Alabama Alabama District 1 21
Alabama Alabama District 2 5
Alaska Alaska District 1 3
Alaska Alaska District 2 4

I want to use Pandas to graph the numbers of school vs. state, and so far I have the following code:

school_data.plot(kind='bar', 
                     x="State Name", 
                     xlabel="State",
                     y="Schools Per District",
                     ylabel="Number of Schools",
                     rot=0,
                     width=10,
                     figsize=(15, 5),
                     title="Number of Schools per District vs. US State"
                     );

However, the resulting graph I believe is graphing every single school district instead of summing all school districts by state, and is therefore printing too much data.

graph of school districts generated by Pandas

How would I fix this so that there are only 50 bars on the graph, where each bar represents the total number of schools per state?

Asked By: FJJ

||

Answers:

You can group State Name and sum school per districs then create a bar chart using the agreggated data

# Group the data by 'State Name' and sum the 'Schools Per District' values
grouped_data = school_data.groupby('State Name')['Schools Per District'].sum().reset_index()

# Plot the aggregated data
ax = grouped_data.plot(kind='bar', 
                       x='State Name', 
                       xlabel='State',
                       y='Schools Per District',
                       ylabel='Number of Schools',
                       rot=0,
                       width=10,
                       figsize=(15, 5),
                       title='Number of Schools per District vs. US State'
                      )

The resulting graph with only 50 bars, where each bar represents the total number of schools per state.

Answered By: Hasan Patel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.