How to do a rolling count of transactions grouped by user
Question:
I have a dataset that contains userIds, datetime, transactionId, amount, and merchantDescription columns for the dates ranging between 2022-11-01 and 2022-12-31.
I need to count user transactions during a rolling 5, 10, and 30 day period but I am having difficulty.
Here is my process:
- Set the index by the datetime
df1 = df1.set_index('authorizationProcessedAt')
- Calculate the rolling count by userId
transaction_counts = df1.groupby('userId')['transactionId'].rolling(5).count()
- I then rename and join the two dataframes together
transaction_counts = pd.DataFrame(transaction_counts)
transaction_counts.rename(columns={'transactionId':'transaction_count'}, inplace=True)
df1 = pd.concat([df1, transaction_counts], axis=0)
The results produced look like this and are not what I need:
Can anyone advise how to achieve a rolling count by user?
Answers:
Try this
# Set the index to datetime
df1 = df1.set_index(pd.DatetimeIndex(df1['authorizationProcessedAt']))
# Group by userId and a rolling time window, and count the number of transactions
rolling_counts = df1.groupby('userId')['transactionId'].rolling('5D').count()
#rename & merge
rolling_counts = rolling_counts.rename('transaction_count').reset_index().drop('userId', axis=1)
df = pd.merge(df1, rolling_counts, left_index=True, right_on='authorizationProcessedAt')
You can replace ‘5D’ with ’10D’ or ’30D’ to get rolling counts for 10 days or 30 days respectively
I have a dataset that contains userIds, datetime, transactionId, amount, and merchantDescription columns for the dates ranging between 2022-11-01 and 2022-12-31.
I need to count user transactions during a rolling 5, 10, and 30 day period but I am having difficulty.
Here is my process:
- Set the index by the datetime
df1 = df1.set_index('authorizationProcessedAt')
- Calculate the rolling count by userId
transaction_counts = df1.groupby('userId')['transactionId'].rolling(5).count()
- I then rename and join the two dataframes together
transaction_counts = pd.DataFrame(transaction_counts)
transaction_counts.rename(columns={'transactionId':'transaction_count'}, inplace=True)
df1 = pd.concat([df1, transaction_counts], axis=0)
The results produced look like this and are not what I need:
Can anyone advise how to achieve a rolling count by user?
Try this
# Set the index to datetime
df1 = df1.set_index(pd.DatetimeIndex(df1['authorizationProcessedAt']))
# Group by userId and a rolling time window, and count the number of transactions
rolling_counts = df1.groupby('userId')['transactionId'].rolling('5D').count()
#rename & merge
rolling_counts = rolling_counts.rename('transaction_count').reset_index().drop('userId', axis=1)
df = pd.merge(df1, rolling_counts, left_index=True, right_on='authorizationProcessedAt')
You can replace ‘5D’ with ’10D’ or ’30D’ to get rolling counts for 10 days or 30 days respectively