Sum of values of a column is different when plotted on a seaborn barplot

Question:

i have a dataframe that has a column with no. of vehicles(count) at a certain station. the code gives the correct sum of this column, when i plot it on the seaborn barplot it is giving the wrong value.

# this is the dataframe
ac=df_ras_1.loc[(df_ras_1['ENTRY_STATION']=='Ras') & (df_ras_1['EXIT_STATION']=='Ps')&(df_ras_1['VEHICLETYPE']=='Car')]

# sum of VEHICLETYPE
ac['VEH_COUNT'].sum()  ##gives the right value
391992

# when plotted it on the graph shows 
plt.figure(figsize=(20,10))
plt.xticks(rotation=90)

cp = sns.barplot(x="EXIT_STATION", y='VEH_COUNT',hue="VEHICLETYPE", data=ac)

link to the graph this graph should give 391992, but its giving different value

Dataframe sample(df_ras_1) from which df ‘ac’ is made

DATE        ENTRY_STATION    EXIT_STATION   VEHICLETYPE  VEH_COUNT
2016-01-01  Ras              Ps             Car          1  
2016-01-02  Ras              Ps             Car          9  
2016-01-03  Ras              Ps             Car          12 
2016-01-04  Ras              Ps             Car          3  
Asked By: Chowdry

||

Answers:

Well, let’s just say you have this following dataframe:

data = {'ENTRY': ['Ras', 'Ras', 'Ras', 'Mordor', 'Mordor'],
    'EXIT': ['Ps', 'Ps', 'Ps', 'Ps', 'Sky'],
    'TYPE': ['Car', 'Car', 'Car', 'Car', 'Car'],
    'VEH_COUNT': [1, 9, 12, 3, 7]}

df = pd.DataFrame(data)
df.head()


#    ENTRY  EXIT    TYPE    VEH_COUNT
#0  Ras      Ps      Car    1
#1  Ras      Ps      Car    9
#2  Ras      Ps      Car    12
#3  Mordor   Ps      Car    3
#4  Mordor   Sky     Car    7

I would recommend you to use groupby() and sum() to make your data more clear and easier to both analyse and visualise.

df_grp = df.groupby(['ENTRY', 'EXIT', 'TYPE'])['VEH_COUNT'].sum().reset_index()
df_grp.head()


#   ENTRY   EXIT    TYPE    VEH_COUNT
#0  Mordor  Ps      Car      3
#1  Mordor  Sky     Car      7
#2  Ras     Ps      Car      22

As a result, you can easily plot charts based on your requirements.

# Seaborn Visualisation
ax = sns.barplot(x='EXIT', y='VEH_COUNT', data=df_grp)

enter image description here

You might notice that the value of Ps is now 12.5. That’s because it calculates the mean of VEH_COUNT from ROW#0 and ROW#2.

Therefore 22 + 3 = 25, then 25 / 2 = 12.5

Whereas the EXIT Sky stays the same since it has only one value


Finally, to solve your problem, you can add estimator=sum

ax = sns.barplot(x='EXIT', y='VEH_COUNT', data=df_grp, ci=None, estimator=sum)

enter image description here

Answered By: N. Arunoprayoch
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.