Plotting the count of occurrences per date
Question:
I’m very new to pandas data frame that has a date time column, and a column that contains a string of text (headlines). Each headline will be a new row.
I need to plot the date on the x-axis, and the y-axis needs to contain how many times a headline occurs on each date.
So for example, one date may contain 3 headlines.
What’s the simplest way to do this? I can’t figure out how to do it at all. Maybe add another column with a ‘1’ for each row? If so, how would you do this?
Please point me in the direction of anything that may help!
Thanks you!
I have tried plotting the count on the y, but keep getting errors, I tried creating a variable that counts the number of rows, but that didn’t return anything of use either.
I tried add a column with the count of headlines
df_data['headline_count'] = df_data['headlines'].count
and I tried the group by method
df_data['count'] = df.groupby('headlines')['headlines'].transform('count')
When I use groupie, i get an error of
KeyError: 'headlines'
The output should simply be a plot with how many times a date is repeated in the dataframe (which signals that there are multiple headlines) in the rows plotted on the y-axis. And the x-axis should be the date that the observations occurred.
Answers:
Use Series.value_counts
with date
column for Series
with Series.sort_index
or GroupBy.size
:
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-10','2019-10-09']),
'col1':['a','b','c']})
s = df['date'].value_counts().sort_index()
#alternative
#s = df.groupby('date').size()
print (s)
2019-10-09 1
2019-10-10 2
Name: date, dtype: int64
And last use Series.plot
:
s.plot()
Try this:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
A = pd.DataFrame(columns=["Date", "Headlines"], data=[["01/03/2018","Cricket"],["01/03/2018","Football"],
["02/03/2018","Football"],["01/03/2018","Football"],
["02/03/2018","Cricket"],["02/03/2018","Cricket"]] )
Your data looks like this:
print (A)
Date Headlines
0 01/03/2018 Cricket
1 01/03/2018 Football
2 02/03/2018 Football
3 01/03/2018 Football
4 02/03/2018 Cricket
5 02/03/2018 Cricket
Now do a group by operation on it:
data = A.groupby(["Date","Headlines"]).size()
print(data)
Date Headlines
01/03/2018 Cricket 1
Football 2
02/03/2018 Cricket 2
Football 1
dtype: int64
You can now plot it using the below code:
# set width of bar
barWidth = 0.25
# set height of bar
bars1 = data.loc[(data.index.get_level_values('Headlines') =="Cricket")].values
bars2 = data.loc[(data.index.get_level_values('Headlines') =="Football")].values
# Set position of bar on X axis
r1 = np.arange(len(bars1))
r2 = [x + barWidth for x in r1]
# Make the plot
plt.bar(r1, bars1, color='#7f6d5f', width=barWidth, edgecolor='white', label='Cricket')
plt.bar(r2, bars2, color='#557f2d', width=barWidth, edgecolor='white', label='Football')
# Add xticks on the middle of the group bars
plt.xlabel('group', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(bars1))], data.index.get_level_values('Date').unique())
# Create legend & Show graphic
plt.legend()
plt.xlabel("Date")
plt.ylabel("Count")
plt.show()
Have you tried this:
df2 = df_data.groupby(['headlines']).count()
You should save the results of this in a new data frame (df2) and not in another column because the result of the groupby wont have the same dimensions of the original data frame.
I’m very new to pandas data frame that has a date time column, and a column that contains a string of text (headlines). Each headline will be a new row.
I need to plot the date on the x-axis, and the y-axis needs to contain how many times a headline occurs on each date.
So for example, one date may contain 3 headlines.
What’s the simplest way to do this? I can’t figure out how to do it at all. Maybe add another column with a ‘1’ for each row? If so, how would you do this?
Please point me in the direction of anything that may help!
Thanks you!
I have tried plotting the count on the y, but keep getting errors, I tried creating a variable that counts the number of rows, but that didn’t return anything of use either.
I tried add a column with the count of headlines
df_data['headline_count'] = df_data['headlines'].count
and I tried the group by method
df_data['count'] = df.groupby('headlines')['headlines'].transform('count')
When I use groupie, i get an error of
KeyError: 'headlines'
The output should simply be a plot with how many times a date is repeated in the dataframe (which signals that there are multiple headlines) in the rows plotted on the y-axis. And the x-axis should be the date that the observations occurred.
Use Series.value_counts
with date
column for Series
with Series.sort_index
or GroupBy.size
:
df = pd.DataFrame({'date':pd.to_datetime(['2019-10-10','2019-10-10','2019-10-09']),
'col1':['a','b','c']})
s = df['date'].value_counts().sort_index()
#alternative
#s = df.groupby('date').size()
print (s)
2019-10-09 1
2019-10-10 2
Name: date, dtype: int64
And last use Series.plot
:
s.plot()
Try this:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
A = pd.DataFrame(columns=["Date", "Headlines"], data=[["01/03/2018","Cricket"],["01/03/2018","Football"],
["02/03/2018","Football"],["01/03/2018","Football"],
["02/03/2018","Cricket"],["02/03/2018","Cricket"]] )
Your data looks like this:
print (A)
Date Headlines
0 01/03/2018 Cricket
1 01/03/2018 Football
2 02/03/2018 Football
3 01/03/2018 Football
4 02/03/2018 Cricket
5 02/03/2018 Cricket
Now do a group by operation on it:
data = A.groupby(["Date","Headlines"]).size()
print(data)
Date Headlines
01/03/2018 Cricket 1
Football 2
02/03/2018 Cricket 2
Football 1
dtype: int64
You can now plot it using the below code:
# set width of bar
barWidth = 0.25
# set height of bar
bars1 = data.loc[(data.index.get_level_values('Headlines') =="Cricket")].values
bars2 = data.loc[(data.index.get_level_values('Headlines') =="Football")].values
# Set position of bar on X axis
r1 = np.arange(len(bars1))
r2 = [x + barWidth for x in r1]
# Make the plot
plt.bar(r1, bars1, color='#7f6d5f', width=barWidth, edgecolor='white', label='Cricket')
plt.bar(r2, bars2, color='#557f2d', width=barWidth, edgecolor='white', label='Football')
# Add xticks on the middle of the group bars
plt.xlabel('group', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(bars1))], data.index.get_level_values('Date').unique())
# Create legend & Show graphic
plt.legend()
plt.xlabel("Date")
plt.ylabel("Count")
plt.show()
Have you tried this:
df2 = df_data.groupby(['headlines']).count()
You should save the results of this in a new data frame (df2) and not in another column because the result of the groupby wont have the same dimensions of the original data frame.