Incorrect labels for bars in bar plot

Question:

I’m taking a biostatistics class and we’ve been asked to manipulate some data from a CSV into various different types of plots. I’m having issues getting each bar on a bar plot to show the correct categorical variable. I’m following an example the professor provided and not getting what I want. I’m totally new to this, so my apologies for formatting errors.

I’ve created the dataframe variable and am now trying to plot it as a bar graph (and later on other variables in the CSV as other types of plots). Not sure if I’m providing the code in the correct manner, but here’s what I have so far. We’re supposed to create a bar plot of PET using the number of cases (number of each pet/type of pet).

This is the data for this particular question. In the CSV it’s shown as just the type of pet each student has (not sure how to share the CSV, but if it’d help I can post it).

I’m editing the post to show the code I’ve run to get the plot, and include the CSV info (hope I’m doing this right):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
HW2 = pd.read_csv("/Path/to/file")
HW2Grouped = HW2.groupby('Pet').count()
HW2Grouped['Pet'] = HW2Grouped.index
HW2Grouped.columns = ['Pet', 'Count', 'col_1', 'col_2', 'col_3', 'col_4']
%matplotlib inline
HW2bar = HW2Grouped.plot.bar(x = 'Pet', y = 'Count', title = "Pet count for students")
HW2bar.set_xlabel('Pet Type')
t = HW2bar.set_ylabel('Count')

This is the data I have to work with (sorry it’s just a screenshot).

This is the bar plot I got from the code I ran.

Asked By: Nelson Badia

||

Answers:

It seems to me that when you added a new column, Pet, it became the new last column. Then you renamed columns of the HW2Grouped, and the first column (where the results of count aggregation are) was renamed to Pet, and the actual Pet column became col_4.

Let me now trace back to what appeared to be wrong in the steps you tried — to make it clear what was going on.

When you grouped your DataFrame with this code:

HW2Grouped = HW2.groupby('Pet').count()

You received this:

     Height  Ice Cream  n of letters  Favorite TA  Minutes to Hometown 
Pet 
Cat       1         1             1            1                   1
Dog      17        17            17           17                  17 
Horse     2         2             2            2                   2  
None      4         4             4            4                   4

After you performed adding a new column Pet (what you might thought was creating a variable) to HW2Grouped, it started to look like this:

     Height  Ice Cream  n of letters  Favorite TA  Minutes to Hometown   Pet  
Pet 
Cat       1         1             1             1                    1    Cat
Dog      17        17            17            17                   17    Dog      
Horse     2         2             2             2                    2    Horse     
None      4         4             4             4                    4    None      

Then, when you changed the .columns attribute, your grouped DataFrame became like this:

        Pet   Count   col_1   col_2    col_3   col_4
Pet 
Cat       1       1      1       1        1     Cat
Dog      17      17     17      17       17     Dog      
Horse     2       2      2       2        2     Horse     
None      4       4      4       4        4     None      

Then, when plotting HW2Grouped, you passed Pet as an x, but Pet now wasn’t there after renaming the columns, it now was the former Height column. This led to the wrong bar names.

You may try:

%matplotlib inline
HW2bar = HW2Grouped.plot.bar(x = 'col_4', y = 'Count', title = "Pet count for students")
HW2bar.set_xlabel('Pet Type')
t = HW2bar.set_ylabel('Count') 

I think what you originally intended to do was this (except you didn’t indicate the column to perform the count on):

HW2Grouped = HW2.groupby('Pet')['Pet'].count()

However, this won’t sort the bars in a descending order.

There is a short way without column additions and renaming, the bars will be sorted:

HW2['Pet'].value_counts().plot.bar()
Answered By: n.shabankin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.