Python to add data label on linechart from Matplotlib and Pandas GroupBy
Question:
I am hoping to add data labels to a line chart produced by Matplotlib from Pandas GroupBy.
import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO
csvfile = StringIO(
"""
Name Year - Month Score
Mike 2022-09 192
Mike 2022-08 708
Mike 2022-07 140
Mike 2022-05 144
Mike 2022-04 60
Mike 2022-03 108
Kate 2022-07 19850
Kate 2022-06 19105
Kate 2022-05 23740
Kate 2022-04 19780
Kate 2022-03 15495
Peter 2022-08 51
Peter 2022-07 39
Peter 2022-06 49
Peter 2022-05 49
Peter 2022-04 79
Peter 2022-03 13
Lily 2022-11 2
David 2022-11 3
David 2022-10 6
David 2022-08 2""")
df = pd.read_csv(csvfile, sep = 't', engine='python')
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month') # sort the data-frame by a column
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
label = sub_frame_sorted['Score']
line_chart.annotate(label, (sub_frame_sorted['Year - Month'], sub_frame_sorted['Score']), ha='center')
plt.show()
The 2 lines for data labels throw an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I have them corrected?
Answers:
So, the problem should be inside your for loop.
You can replace your code by this one:
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month')
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
for x, y in zip(sub_frame_sorted["Year - Month"], sub_frame_sorted["Score"]):
label = "{:.0f}".format(y) # format the label as a string
line_chart.annotate(label, (x, y), textcoords="offset points", xytext=(0,10), ha='center')
And also if you face error regarding ‘Year-Month’ you should convert that using to_datetime() method.
Please let me know if this helps. Thanks.
As the error says that the problem is in the annotate()
. The sub_frame_sorted
is a dataframe and you need to use a for
loop to get each of the items within it before using annotate. Also, the x-axis is year-month, which is seen as string and you will run into issues. So, you need to just use index. I have used 0,1,2… using i
. This should work… you can add a small offset if you think the text is overlapping a line
Hope this is what you are looking for.
Updated code
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month') # sort the data-frame by a column
line_chart = sub_frame_sorted.plot("Year - Month", "Score", legend=False)
i=0
for ix, vl in sub_frame_sorted.iterrows():
line_chart.annotate(vl['Score'], (i, vl['Score']), ha='center')
i=i+1
plt.show()
Output plots
I am hoping to add data labels to a line chart produced by Matplotlib from Pandas GroupBy.
import matplotlib.pyplot as plt
import pandas as pd
from io import StringIO
csvfile = StringIO(
"""
Name Year - Month Score
Mike 2022-09 192
Mike 2022-08 708
Mike 2022-07 140
Mike 2022-05 144
Mike 2022-04 60
Mike 2022-03 108
Kate 2022-07 19850
Kate 2022-06 19105
Kate 2022-05 23740
Kate 2022-04 19780
Kate 2022-03 15495
Peter 2022-08 51
Peter 2022-07 39
Peter 2022-06 49
Peter 2022-05 49
Peter 2022-04 79
Peter 2022-03 13
Lily 2022-11 2
David 2022-11 3
David 2022-10 6
David 2022-08 2""")
df = pd.read_csv(csvfile, sep = 't', engine='python')
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month') # sort the data-frame by a column
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
label = sub_frame_sorted['Score']
line_chart.annotate(label, (sub_frame_sorted['Year - Month'], sub_frame_sorted['Score']), ha='center')
plt.show()
The 2 lines for data labels throw an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I have them corrected?
So, the problem should be inside your for loop.
You can replace your code by this one:
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month')
line_chart = sub_frame_sorted.plot("Year - Month", "Score")
for x, y in zip(sub_frame_sorted["Year - Month"], sub_frame_sorted["Score"]):
label = "{:.0f}".format(y) # format the label as a string
line_chart.annotate(label, (x, y), textcoords="offset points", xytext=(0,10), ha='center')
And also if you face error regarding ‘Year-Month’ you should convert that using to_datetime() method.
Please let me know if this helps. Thanks.
As the error says that the problem is in the annotate()
. The sub_frame_sorted
is a dataframe and you need to use a for
loop to get each of the items within it before using annotate. Also, the x-axis is year-month, which is seen as string and you will run into issues. So, you need to just use index. I have used 0,1,2… using i
. This should work… you can add a small offset if you think the text is overlapping a line
Hope this is what you are looking for.
Updated code
for group_name, sub_frame in df.groupby("Name"):
if sub_frame.shape[0] >= 2:
sub_frame_sorted = sub_frame.sort_values('Year - Month') # sort the data-frame by a column
line_chart = sub_frame_sorted.plot("Year - Month", "Score", legend=False)
i=0
for ix, vl in sub_frame_sorted.iterrows():
line_chart.annotate(vl['Score'], (i, vl['Score']), ha='center')
i=i+1
plt.show()
Output plots