How can I create from scratch a table that allows me to get the last order of each user using python?


Example Data

I am doing the exercise of getting a table with the last purchase of each user from an invented original table which contains the name of the UserID, date of purchase, ID of the purchased item, InvoiceID and other features. I need to create this new table in Python and so far I have not been able to find a way to do it.

I was expecting to get a table with only one invoice per user (and several items in each invoice), each invoice being the last one registered by each of them. The sample data can be downloaded in: link_github

I have used this code to get the last order for every user:

import pandas as pd
df = pd.DataFrame({'user': [1, 2, 3, 1, 1, 2, 2],
                   'product':['A','B','C', 'A','C','D','A'], 
                   'invoice': [101, 102, 103, 104, 104, 105, 105],
                   'date': ['1/02/2023', '1/03/2023', 

df['lastorder'] = df.groupby(['user'])['date'].
                                            apply(lambda x: x.shift())

After applying it I have tried to create a new table through filters but I can’t get it to combine the last items ordered by every user on their last purchase date aligned with the InvoiceID.

I’m expecting to get a table that contains, only the last purchases made by users 1, 2 and 3 (in dates 1/05/2023, 1/06/2023 and 1/04/2023, respectively) associated with invoices 103, 104 and 105 in the example given above.





df_dates = df.sort_values(['date']).groupby(['UserID']).tail(1)
df_dates = df_dates[['UserID', 'date']]
df_lastorder = df.merge(df_dates, how='inner', on=['UserID', 'date']).sort_values(['UserID'])

The first line merely takes the last 1 row (the tail) of each grouped UserID. Sorting it first means that the last row is the most recent.

The second line just takes the UserID and date columns because that is all we need. It returns a list of each client with the date of last order for each of them.

The final line is where the magic happens. The merge function uses an inner join (like in databases if you are familiar with sql). That basically means a link is made on UserID and dates. A list of all rows in df where the UserID and date match those in df_dates is returned. The sorting by UesrID at the end of the statement s not strictly necessary but makes it easier to read the data with UserID orders clustered together.

Answered By: Galo do Leste
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.