How to avoid graphing duplicate rows in Pandas .plot()
Question:
I have a Pandas DataFrame that looks like this with many entries for all 50 US states:
State Name
School District
Schools Per District
Alabama
Alabama District 1
21
Alabama
Alabama District 2
5
Alaska
Alaska District 1
3
Alaska
Alaska District 2
4
I want to use Pandas to graph the numbers of school vs. state, and so far I have the following code:
school_data.plot(kind='bar',
x="State Name",
xlabel="State",
y="Schools Per District",
ylabel="Number of Schools",
rot=0,
width=10,
figsize=(15, 5),
title="Number of Schools per District vs. US State"
);
However, the resulting graph I believe is graphing every single school district instead of summing all school districts by state, and is therefore printing too much data.
How would I fix this so that there are only 50 bars on the graph, where each bar represents the total number of schools per state?
Answers:
You can group State Name and sum school per districs then create a bar chart using the agreggated data
# Group the data by 'State Name' and sum the 'Schools Per District' values
grouped_data = school_data.groupby('State Name')['Schools Per District'].sum().reset_index()
# Plot the aggregated data
ax = grouped_data.plot(kind='bar',
x='State Name',
xlabel='State',
y='Schools Per District',
ylabel='Number of Schools',
rot=0,
width=10,
figsize=(15, 5),
title='Number of Schools per District vs. US State'
)
The resulting graph with only 50 bars, where each bar represents the total number of schools per state.
I have a Pandas DataFrame that looks like this with many entries for all 50 US states:
State Name | School District | Schools Per District |
---|---|---|
Alabama | Alabama District 1 | 21 |
Alabama | Alabama District 2 | 5 |
Alaska | Alaska District 1 | 3 |
Alaska | Alaska District 2 | 4 |
I want to use Pandas to graph the numbers of school vs. state, and so far I have the following code:
school_data.plot(kind='bar',
x="State Name",
xlabel="State",
y="Schools Per District",
ylabel="Number of Schools",
rot=0,
width=10,
figsize=(15, 5),
title="Number of Schools per District vs. US State"
);
However, the resulting graph I believe is graphing every single school district instead of summing all school districts by state, and is therefore printing too much data.
How would I fix this so that there are only 50 bars on the graph, where each bar represents the total number of schools per state?
You can group State Name and sum school per districs then create a bar chart using the agreggated data
# Group the data by 'State Name' and sum the 'Schools Per District' values
grouped_data = school_data.groupby('State Name')['Schools Per District'].sum().reset_index()
# Plot the aggregated data
ax = grouped_data.plot(kind='bar',
x='State Name',
xlabel='State',
y='Schools Per District',
ylabel='Number of Schools',
rot=0,
width=10,
figsize=(15, 5),
title='Number of Schools per District vs. US State'
)
The resulting graph with only 50 bars, where each bar represents the total number of schools per state.