How to do a rolling count of transactions grouped by user

Question:

I have a dataset that contains userIds, datetime, transactionId, amount, and merchantDescription columns for the dates ranging between 2022-11-01 and 2022-12-31.

I need to count user transactions during a rolling 5, 10, and 30 day period but I am having difficulty.

Here is my process:

  1. Set the index by the datetime
df1 = df1.set_index('authorizationProcessedAt')
  1. Calculate the rolling count by userId
transaction_counts = df1.groupby('userId')['transactionId'].rolling(5).count()
  1. I then rename and join the two dataframes together
transaction_counts = pd.DataFrame(transaction_counts)
transaction_counts.rename(columns={'transactionId':'transaction_count'}, inplace=True)
df1 = pd.concat([df1, transaction_counts], axis=0)

The results produced look like this and are not what I need:

head of dataframe

Can anyone advise how to achieve a rolling count by user?

Asked By: Munessain

||

Answers:

Try this

# Set the index to datetime
df1 = df1.set_index(pd.DatetimeIndex(df1['authorizationProcessedAt']))
# Group by userId and a rolling time window, and count the number of transactions
rolling_counts = df1.groupby('userId')['transactionId'].rolling('5D').count()
#rename & merge
rolling_counts = rolling_counts.rename('transaction_count').reset_index().drop('userId', axis=1)
df = pd.merge(df1, rolling_counts, left_index=True, right_on='authorizationProcessedAt')

You can replace ‘5D’ with ’10D’ or ’30D’ to get rolling counts for 10 days or 30 days respectively

Answered By: amjad khatabi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.