Python to add data label on linechart from Matplotlib and Pandas GroupBy

Question:

I am hoping to add data labels to a line chart produced by Matplotlib from Pandas GroupBy.

import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO

csvfile = StringIO(
"""
Name    Year - Month    Score
Mike    2022-09 192
Mike    2022-08 708
Mike    2022-07 140
Mike    2022-05 144
Mike    2022-04 60
Mike    2022-03 108
Kate    2022-07 19850
Kate    2022-06 19105
Kate    2022-05 23740
Kate    2022-04 19780
Kate    2022-03 15495
Peter   2022-08 51
Peter   2022-07 39
Peter   2022-06 49
Peter   2022-05 49
Peter   2022-04 79
Peter   2022-03 13
Lily    2022-11 2
David   2022-11 3
David   2022-10 6
David   2022-08 2""")

df = pd.read_csv(csvfile, sep = 't', engine='python')

for group_name, sub_frame in df.groupby("Name"):
    if sub_frame.shape[0] >= 2:
        sub_frame_sorted = sub_frame.sort_values('Year - Month')       # sort the data-frame by a column

        line_chart = sub_frame_sorted.plot("Year - Month", "Score")

        label = sub_frame_sorted['Score']
        line_chart.annotate(label, (sub_frame_sorted['Year - Month'], sub_frame_sorted['Score']), ha='center') 

plt.show()

The 2 lines for data labels throw an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I have them corrected?

Asked By: Mark K

||

Answers:

So, the problem should be inside your for loop.

You can replace your code by this one:

for group_name, sub_frame in df.groupby("Name"):
    if sub_frame.shape[0] >= 2:
        sub_frame_sorted = sub_frame.sort_values('Year - Month')

        line_chart = sub_frame_sorted.plot("Year - Month", "Score")
        for x, y in zip(sub_frame_sorted["Year - Month"], sub_frame_sorted["Score"]):
            label = "{:.0f}".format(y)  # format the label as a string
            line_chart.annotate(label, (x, y), textcoords="offset points", xytext=(0,10), ha='center') 

And also if you face error regarding ‘Year-Month’ you should convert that using to_datetime() method.

Please let me know if this helps. Thanks.

As the error says that the problem is in the annotate(). The sub_frame_sorted is a dataframe and you need to use a for loop to get each of the items within it before using annotate. Also, the x-axis is year-month, which is seen as string and you will run into issues. So, you need to just use index. I have used 0,1,2… using i. This should work… you can add a small offset if you think the text is overlapping a line
Hope this is what you are looking for.

Updated code

for group_name, sub_frame in df.groupby("Name"):
    if sub_frame.shape[0] >= 2:
        sub_frame_sorted = sub_frame.sort_values('Year - Month')       # sort the data-frame by a column
        line_chart = sub_frame_sorted.plot("Year - Month", "Score", legend=False)
        i=0
        for ix, vl in sub_frame_sorted.iterrows(): 
            line_chart.annotate(vl['Score'], (i, vl['Score']), ha='center') 
            i=i+1
plt.show()

Output plots

enter image description here
enter image description here
enter image description here
enter image description here

Answered By: Redox
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.