How to create a cumulative line graph from a data frame containing a list of date time values

Question:

I have a large data frame with a ‘date_joined’ column, indicating the date that members joined a certain forum. I wish to plot a line graph that cumulatively counts the number of members joining each day, with date on the x axis and the y axis indicating the count value for each date (the number of people who joined on that date). What is the best way to do this?

I would also like to lay this line graph over a bar chart or histogram that counts the number of members who joined on each day – though my data set covers a period of five years so it would be good to group this based on months.

My data looks something like this:

user_name thread_name post_text date_joined
BoxCutter Expierence with my coworker …. 2018-05-02
Mainlander Expierence with my coworker ….. 2022-03-25

Any help would be much appreciated!

Asked By: Connor95

||

Answers:

A little unclear what you meant with on top, so here are tow solutions:

import pandas as pd
import matplotlib.pyplot as plt

data = {'user_name': ['BoxCutter', 'Mainlander', 'Bob', 'Alice', 'Eve'],
        'thread_name': ['Experience with my coworker', 'Experience with my boss', 'Experience with my manager', 'Experience with my team', 'Experience with my job'],
        'post_text': ['...', '...', '...', '...', '...'],
        'date_joined': ['2018-05-02', '2022-03-25', '2019-12-10', '2020-06-15', '2021-09-01']}

df = pd.DataFrame(data)

df['date_joined'] = pd.to_datetime(df['date_joined'])

daily_count = df.groupby('date_joined').count()

cumulative_count = daily_count.cumsum()

plt.plot(cumulative_count.index, cumulative_count['user_name'])
plt.xlabel('Date Joined')
plt.ylabel('Cumulative Count of Members')
plt.title('Cumulative Count of Members Joined Over Time')
plt.show()

df['year_month'] = df['date_joined'].dt.strftime('%Y-%m')
monthly_count = df.groupby('year_month')['user_name'].count()
plt.bar(monthly_count.index, monthly_count)
plt.xlabel('Year-Month Joined')
plt.ylabel('Count of Members Joined')
plt.title('Count of Members Joined Each Month')
plt.xticks(rotation=45)
plt.show()

will produce

enter image description here
enter image description here

and


import pandas as pd
import matplotlib.pyplot as plt

data = {'user_name': ['BoxCutter', 'Mainlander', 'Bob', 'Alice', 'Eve'],
        'thread_name': ['Experience with my coworker', 'Experience with my boss', 'Experience with my manager', 'Experience with my team', 'Experience with my job'],
        'post_text': ['...', '...', '...', '...', '...'],
        'date_joined': ['2018-05-02', '2022-03-25', '2019-12-10', '2020-06-15', '2021-09-01']}

df = pd.DataFrame(data)
df['date_joined'] = pd.to_datetime(df['date_joined'] )
df['join_month'] = df['date_joined'].dt.to_period('M')

fig, ax1 = plt.subplots(figsize=(10, 6))
ax1.hist(df['date_joined'], bins=60, alpha=0.5, color='blue', edgecolor='black')
ax1.set_xlabel('Date joined')
ax1.set_ylabel('Count')
ax1.set_title('Number of members joined per month')

ax2 = ax1.twinx()

df_line = df.groupby('date_joined').size().cumsum().reset_index(name='cumulative_count')
ax2.plot(df_line['date_joined'], df_line['cumulative_count'], color='red', linewidth=2)
ax2.set_ylabel('Cumulative count')

plt.show()

will produce

enter image description here

Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.