Matplotlib pie chart label does not match value

Question:

I am working on this https://www.kaggle.com/edqian/twitter-climate-change-sentiment-dataset.

I already convert the sentiment from numeric to its character description (i.e. 0 will be Neutral, 1 will be Pro, -1 will be Anti)

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

tweets_df = pd.read_csv('twitter_sentiment_data.csv')

tweets_df.loc[tweets_df['sentiment'] == 0, 'twt_sentiment'] = 'Neutral'
tweets_df.loc[tweets_df['sentiment'] == -1, 'twt_sentiment'] = 'Anti'
tweets_df.loc[tweets_df['sentiment'] == 1, 'twt_sentiment'] = 'Pro'

tweets_df = tweets_df.drop(['sentiment'], axis=1) 

# display(tweets_df.head())
                                                                                                                                              message             tweetid twt_sentiment
0           @tiniebeany climate change is an interesting hustle as it was global warming but the planet stopped warming for 15 yes while the suv boom  792927353886371840          Anti
1  RT @NatGeoChannel: Watch #BeforeTheFlood right here, as @LeoDiCaprio travels the world to tackle climate change https://toco/LkDehj3tNn htt…  793124211518832641           Pro
2                               Fabulous! Leonardo #DiCaprio's film on #climate change is brilliant!!! Do watch. https://toco/7rV6BrmxjW via @youtube  793124402388832256           Pro
3     RT @Mick_Fanning: Just watched this amazing documentary by leonardodicaprio on climate change. We all think this… https://toco/kNSTE8K8im  793124635873275904           Pro
4         RT @cnalive: Pranita Biswasi, a Lutheran from Odisha, gives testimony on effects of climate change & natural disasters on the po…  793125156185137153           NaN

I want to create a graph with subplot that show the sentiment in value and percentage. The code I tried:

sns.set(font_scale=1.5)
style.use("seaborn-poster")

fig, axes = plt.subplots(1, 2, figsize=(20, 10), dpi=100)

sns.countplot(tweets_df["twt_sentiment"], ax=axes[0])
labels = list(tweets_df["twt_sentiment"].unique())

axes[1].pie(tweets_df["twt_sentiment"].value_counts(),
            autopct="%1.0f%%",
            labels=labels,
            startangle=90,
            explode=tuple([0.1] * len(labels)))

fig.suptitle("Distribution of Tweets", fontsize=20)
plt.show()

The result is not what I wanted as the pie chart label is wrong.

pie chart with wrong labelling

After using sort=False in value_counts, the pie chart looks like this:

after sort=False

Asked By: armyamy

||

Answers:

Try this instead:

axes[1].pie(tweets_df["twt_sentiment"].value_counts(sort=False),
            autopct="%1.0f%%",
            labels=labels,
            startangle=90,
            explode=tuple([0.1] * len(labels)))
Answered By: U12-Forward
  • labels = list(tweets_df["twt_sentiment"].unique()) does not put the labels in the same order as the index of tweets_df.twt_sentiment.value_counts(). The index determines the slice order. Therefore, it’s best to use the .value_counts() index as the labels.
  • Labels can easily be added to the bar plot, then the pie chart is unnecessary.
import pandas as pd
import matplotlib.pyplot as plt

tweets_df = pd.read_csv('data/kaggle/twitter_climate_change_sentiment/twitter_sentiment_data.csv')

tweets_df.loc[tweets_df['sentiment'] == -1, 'twt_sentiment'] = 'Anti'
tweets_df.loc[tweets_df['sentiment'] == 1, 'twt_sentiment'] = 'Pro'
tweets_df.loc[tweets_df['sentiment'] == 0, 'twt_sentiment'] = 'Neutral'

# assign value_counts to a variable; this is a pandas.Series
vc = tweets_df.twt_sentiment.value_counts()

# assign the value_counts index as the labels
labels = vc.index

# custom colors
colors = ['tab:blue', 'tab:orange', 'tab:green']

fig, axes = plt.subplots(1, 2, figsize=(10, 5), dpi=100)

# plot the pandas.Series directly with pandas.Series.plot
p1 = vc.plot(kind='bar', ax=axes[0], color=colors, rot=0, xlabel='Tweet Sentiment', width=.75)

# add count label
axes[0].bar_label(p1.containers[0], label_type='center')

# add percent labels
blabels = [f'{(v / vc.sum())*100:0.0f}%' for v in vc]
axes[0].bar_label(p1.containers[0], labels=blabels, label_type='edge')

# make space at the top of the bar plot
axes[0].margins(y=0.1)

# add the pie plot
axes[1].pie(vc, labels=labels, autopct="%1.0f%%", startangle=90, explode=tuple([0.1] * len(labels)), colors=colors)

fig.suptitle("Distribution of Tweets", fontsize=20)
plt.show()

enter image description here

Answered By: Trenton McKinney