Make a list of active subscriptions from a pandas database

Question

I’m trying to create a list of active subscriptions in pandas, but all of my solutions seem too convoluted and most likely very slow, so I was wondering if there is an easier and/or better way to make it do what I want it to do.

My data looks something like this:

User	Subscription Start	Subscription Length	Subscription End
John	2023-01-01	30	2023-01-31
John	2023-01-15	30	2023-02-14
Jack	2023-02-01	120	2023-06-01
Jack	2022-08-01	30	2023-08-31
Jeff	2023-01-01	30	2023-01-31

Users can buy multiple subscriptions and they add together, and users can also let their subscriptions stop and buy it at a later date. Ideally what I would like the output to be is to have one line for each ‘subscription instant’ for each user, so it would look something like this:

User	Subscription Start	Subscription Length	Subscription End
John	2023-01-01	60	2023-03-02
Jack	2023-02-01	120	2023-06-01
Jack	2022-08-01	30	2023-08-31
Jeff	2023-01-01	30	2023-01-31

I tried to this with a for loop, where if a user has not had a subscription then they would be added to the list and if they did have one before it would check if the new subscription’s start date falls within any of the subscription ranges already in the list and then adds on top of one of them, but I had way too many for and if statements, and I’m sure there is a more elegant way to do it.

Asked By: Zera

||

Source

Answer 1

So after some thinking, I managed to come up with this, it works good enough:

li = df.iloc[0:0]
for x in df.index:
    if df.iloc[x,0] not in li.User.unique():
        li = li.append(df.iloc[x])
    else:
        if len(li[(li.User == df.iloc[x,0]) & (li.end_date >= df.iloc[x,2]) & (li.start_date <= df.iloc[x,2])]) > 0:
            li.loc[(li.clientid == df.iloc[x,0]) & (li.end_date >= df.iloc[x,2]) & (li.start_date <= df.iloc[x,2]),'sub_length'] += df.iloc[x,3]
        else:
            li = li.append(df.iloc[x])

I’ve found that this only works perfectly if the data is in chronological order. I’ve also made a for loop to check through the list for any errors:

li = li.reset_index(drop=True)
log2 = pd.DataFrame(columns=['Outcome','value'])
log2 = log2.iloc[0:0]
li2 = li.iloc[0:0]
for x in li.index:
        if len(li[(li.User== li.iloc[x,0]) & (li.end_date >= li.iloc[x,2]) & (li.start_date <= li.iloc[x,2])]) > 1:
            li2 = li2.append(li.iloc[x])
            log2 = log2.append({'Outcome':'Bad','value':1},ignore_index=True)
        else:
            log2 = log2.append({'Outcome':'Good','value':1},ignore_index=True)
log2.groupby('Outcome').count()

For this exact code to work the columns should be (in order): User platform start_date sub_length end_date

Answered By: Zera

Make a list of active subscriptions from a pandas database

Question:

Answers: