How to create a cumulative line graph from a data frame containing a list of date time values
Question:
I have a large data frame with a ‘date_joined’ column, indicating the date that members joined a certain forum. I wish to plot a line graph that cumulatively counts the number of members joining each day, with date on the x axis and the y axis indicating the count value for each date (the number of people who joined on that date). What is the best way to do this?
I would also like to lay this line graph over a bar chart or histogram that counts the number of members who joined on each day – though my data set covers a period of five years so it would be good to group this based on months.
My data looks something like this:
user_name
thread_name
post_text
date_joined
BoxCutter
Expierence with my coworker
….
2018-05-02
Mainlander
Expierence with my coworker
…..
2022-03-25
Any help would be much appreciated!
Answers:
A little unclear what you meant with on top, so here are tow solutions:
import pandas as pd
import matplotlib.pyplot as plt
data = {'user_name': ['BoxCutter', 'Mainlander', 'Bob', 'Alice', 'Eve'],
'thread_name': ['Experience with my coworker', 'Experience with my boss', 'Experience with my manager', 'Experience with my team', 'Experience with my job'],
'post_text': ['...', '...', '...', '...', '...'],
'date_joined': ['2018-05-02', '2022-03-25', '2019-12-10', '2020-06-15', '2021-09-01']}
df = pd.DataFrame(data)
df['date_joined'] = pd.to_datetime(df['date_joined'])
daily_count = df.groupby('date_joined').count()
cumulative_count = daily_count.cumsum()
plt.plot(cumulative_count.index, cumulative_count['user_name'])
plt.xlabel('Date Joined')
plt.ylabel('Cumulative Count of Members')
plt.title('Cumulative Count of Members Joined Over Time')
plt.show()
df['year_month'] = df['date_joined'].dt.strftime('%Y-%m')
monthly_count = df.groupby('year_month')['user_name'].count()
plt.bar(monthly_count.index, monthly_count)
plt.xlabel('Year-Month Joined')
plt.ylabel('Count of Members Joined')
plt.title('Count of Members Joined Each Month')
plt.xticks(rotation=45)
plt.show()
will produce
and
import pandas as pd
import matplotlib.pyplot as plt
data = {'user_name': ['BoxCutter', 'Mainlander', 'Bob', 'Alice', 'Eve'],
'thread_name': ['Experience with my coworker', 'Experience with my boss', 'Experience with my manager', 'Experience with my team', 'Experience with my job'],
'post_text': ['...', '...', '...', '...', '...'],
'date_joined': ['2018-05-02', '2022-03-25', '2019-12-10', '2020-06-15', '2021-09-01']}
df = pd.DataFrame(data)
df['date_joined'] = pd.to_datetime(df['date_joined'] )
df['join_month'] = df['date_joined'].dt.to_period('M')
fig, ax1 = plt.subplots(figsize=(10, 6))
ax1.hist(df['date_joined'], bins=60, alpha=0.5, color='blue', edgecolor='black')
ax1.set_xlabel('Date joined')
ax1.set_ylabel('Count')
ax1.set_title('Number of members joined per month')
ax2 = ax1.twinx()
df_line = df.groupby('date_joined').size().cumsum().reset_index(name='cumulative_count')
ax2.plot(df_line['date_joined'], df_line['cumulative_count'], color='red', linewidth=2)
ax2.set_ylabel('Cumulative count')
plt.show()
will produce
I have a large data frame with a ‘date_joined’ column, indicating the date that members joined a certain forum. I wish to plot a line graph that cumulatively counts the number of members joining each day, with date on the x axis and the y axis indicating the count value for each date (the number of people who joined on that date). What is the best way to do this?
I would also like to lay this line graph over a bar chart or histogram that counts the number of members who joined on each day – though my data set covers a period of five years so it would be good to group this based on months.
My data looks something like this:
user_name | thread_name | post_text | date_joined |
---|---|---|---|
BoxCutter | Expierence with my coworker | …. | 2018-05-02 |
Mainlander | Expierence with my coworker | ….. | 2022-03-25 |
Any help would be much appreciated!
A little unclear what you meant with on top, so here are tow solutions:
import pandas as pd
import matplotlib.pyplot as plt
data = {'user_name': ['BoxCutter', 'Mainlander', 'Bob', 'Alice', 'Eve'],
'thread_name': ['Experience with my coworker', 'Experience with my boss', 'Experience with my manager', 'Experience with my team', 'Experience with my job'],
'post_text': ['...', '...', '...', '...', '...'],
'date_joined': ['2018-05-02', '2022-03-25', '2019-12-10', '2020-06-15', '2021-09-01']}
df = pd.DataFrame(data)
df['date_joined'] = pd.to_datetime(df['date_joined'])
daily_count = df.groupby('date_joined').count()
cumulative_count = daily_count.cumsum()
plt.plot(cumulative_count.index, cumulative_count['user_name'])
plt.xlabel('Date Joined')
plt.ylabel('Cumulative Count of Members')
plt.title('Cumulative Count of Members Joined Over Time')
plt.show()
df['year_month'] = df['date_joined'].dt.strftime('%Y-%m')
monthly_count = df.groupby('year_month')['user_name'].count()
plt.bar(monthly_count.index, monthly_count)
plt.xlabel('Year-Month Joined')
plt.ylabel('Count of Members Joined')
plt.title('Count of Members Joined Each Month')
plt.xticks(rotation=45)
plt.show()
will produce
and
import pandas as pd
import matplotlib.pyplot as plt
data = {'user_name': ['BoxCutter', 'Mainlander', 'Bob', 'Alice', 'Eve'],
'thread_name': ['Experience with my coworker', 'Experience with my boss', 'Experience with my manager', 'Experience with my team', 'Experience with my job'],
'post_text': ['...', '...', '...', '...', '...'],
'date_joined': ['2018-05-02', '2022-03-25', '2019-12-10', '2020-06-15', '2021-09-01']}
df = pd.DataFrame(data)
df['date_joined'] = pd.to_datetime(df['date_joined'] )
df['join_month'] = df['date_joined'].dt.to_period('M')
fig, ax1 = plt.subplots(figsize=(10, 6))
ax1.hist(df['date_joined'], bins=60, alpha=0.5, color='blue', edgecolor='black')
ax1.set_xlabel('Date joined')
ax1.set_ylabel('Count')
ax1.set_title('Number of members joined per month')
ax2 = ax1.twinx()
df_line = df.groupby('date_joined').size().cumsum().reset_index(name='cumulative_count')
ax2.plot(df_line['date_joined'], df_line['cumulative_count'], color='red', linewidth=2)
ax2.set_ylabel('Cumulative count')
plt.show()
will produce