How can I combine multiple rows (transactions) if the following transactions are closed within 30 days after the 1st transaction

Question:

this is my first post and I will try my best to describe the problem.

I have a table below to help illustrate. I am trying to combine some transactions for each Account ID. The logic I want is: Account AAAAA had their 1st transaction "Close Date" on 2019-12-16, if their following transactions are "closed" within 1 month after the 1st (i.e. window 2019-12-16 ~ 2020-01-16), then I want to combine those transactions to be just one single transaction, but use the "Created Date" from the 1st transaction (i.e 2019-12-15) and "Close Date" from the last transaction within the window (i.e 2020-01-02). Then after that window, the next transaction for Account AAAAA is "closed" on 2020-01-22, the window will be 2020-01-22 ~ 2020-02-22, but they only have one transaction in that window, so it just stays as a single transaction. I want to implement this logic for each Account ID.

Initial Table:

Account ID Created Date Close Date Product Code
AAAAAA 2019-12-15 2019-12-16 Apple; Banana
AAAAAA 2019-12-20 2019-12-26 Strawberry
AAAAAA 2020-01-02 2020-01-02 Apple; Banana
AAAAAA 2020-01-15 2020-01-22 Apple; Banana
BBBBBB 2020-02-04 2020-02-05 Watermelon
BBBBBB 2020-03-09 2020-03-10 Strawberry
BBBBBB 2020-03-10 2020-03-15 Apple; Banana
import pandas as pd

data = [['AAAAA', '2019-12-15', '2019-12-16', 'Apple; Banana'],
['AAAAA', '2019-12-20', '2019-12-26', 'Strawberry'],
['AAAAA', '2020-01-02', '2020-01-02', 'Apple; Banana'],
['AAAAA', '2020-01-15', '2020-01-22', 'Apple; Banana'],
['BBBBB', '2020-02-04', '2020-02-05', 'Watermelon'],
['BBBBB', '2020-03-09', '2020-03-10', 'Strawberry'],
['BBBBB', '2020-03-10', '2020-03-15', 'Apple; Banana']]

initial_df = pd.DataFrame(data, columns=['Account ID', 'Created Date', 'Close Date', 'Products'])

initial_df 

After:

Account ID Created Date Close Date Product Code
AAAAAA 2019-12-15 2020-01-02 Apple; Banana; Strawberry; Apple; Banana
AAAAAA 2020-01-15 2020-01-22 Apple; Banana
BBBBB 2020-02-04 2020-02-05 Watermelon
BBBBB 2020-03-09 2020-03-15 Strawberry; Apple; Banana

Is there a way I can achieve this? I will greatly appreciate you help!!

Asked By: minxyen

||

Answers:

I was able to figure it out myself using below codes, but if there is a better way, I would love to know as well.

from dateutil.relativedelta import relativedelta

grouped = initial_df.groupby(by=['Account ID'])
window_list = []
for a, b in grouped:
  window_end_date = list(b['Close Date'])[0] + dateutil.relativedelta.relativedelta(months=1)
  for i in b['Close Date']:
    if pd.to_datetime(i) <= window_end_date:
      window_list.append(window_end_date)
    elif pd.to_datetime(i) > window_end_date:
      window_end_date = i + dateutil.relativedelta.relativedelta(months=1)
      window_list.append(window_end_date)

initial_df['window'] = window_list
created_start_list = []
close_end_list = []
grouped = initial_df.groupby(['Account ID', 'window'])
for a, b in grouped:
  created_start = min(list(b['Created Date']))
  created_start_list.append(created_start)
  close_end = max(list(b['Close Date']))
  close_end_list.append(close_end)

df = grouped['Products'].apply('; '.join).reset_index()
df['created_start'] = created_start_list
df['closed_start'] = close_end_list
df
Answered By: minxyen
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.