How to select user_id rows pandas

Question:

How can I calculate the first visited date and the last visited date before an order was placed by the user?

USER ID TYPE    DATE
1   Visited September 14, 2020
1   Visited October 4, 2020
1   Visited October 24, 2020
1   Ordered November 1, 2020
2   Visited September 14, 2020
2   Visited October 1, 2020
3   Visited September 1, 2020
3   Visited October 4, 2020
3   Visited October 4, 2020
3   Visited October 19, 2020
3   Ordered January 1, 2021
3   Visited February 11, 2021
3   Visited February 24, 2021
3   Visited March 1, 2021
3   Ordered April 21, 2021

Expected Output:

USER ID Ordered MIN DATE    MAX DATE
1   1   September 14, 2020  October 24, 2020
2   0   September 14, 2020  NAT
3   1   September 1, 2020   October 19, 2020
3   2   February 11, 2021   March 1, 2021
Asked By: Blogger22

||

Answers:

Try:

df['DATE'] = pd.to_datetime(df['DATE'])

df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())
           .set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']
           .unstack('TYPE')
           .groupby(['USER ID', 'grp'], sort=False)
           .agg(Ordered=('Ordered','count'), 
                MIN_DATE=('Visited','first'), 
                MAX_DATE=('Visited','last'))
           .reset_index('grp', drop=True)
           .reset_index()

df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()

df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%B %d, %Y')
df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%B %d, %Y')

Output:

   USER ID  Ordered            MIN_DATE          MAX_DATE
0        1        1  September 14, 2020  October 24, 2020
1        2        0  September 14, 2020               NaN
2        3        1  September 01, 2020  October 19, 2020
3        3        2   February 11, 2021    March 01, 2021
Answered By: Scott Boston

To calculate the first visited date and the last visited date before an order was placed by the user, you can sort the data by the user ID and the date, and then use a loop to iterate through each row of the data. For each user, you can keep track of the first visited date and the last visited date, and update them as you iterate through the rows. When you encounter an order for a user, you can output the first and last visited dates for that user before the order was placed.

import pandas as pd
data = pd.DataFrame({
'USER ID': [1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'TYPE': ['Visited', 'Visited', 'Visited', 'Ordered', 'Visited', 'Visited',
         'Visited', 'Visited', 'Visited', 'Visited', 'Ordered', 'Visited', 'Visited', 'Ordered'],
'DATE': ['September 14, 2020', 'October 4, 2020', 'October 24, 2020', 'November 1, 2020',
         'September 14, 2020', 'October 1, 2020', 'September 1, 2020', 'October 4, 2020',
         'October 4, 2020', 'October 19, 2020', 'January 1, 2021', 'February 11, 2021',
         'February 24, 2021', 'April 21, 2021']})

data['DATE'] = pd.to_datetime(data['DATE'])
grouped_data = data.groupby('USER ID')

for name, group in grouped_data:
    first_visited_date = None
    last_visited_date = None

    for index, row in group.iterrows():
        if row['TYPE'] == 'Visited':
              if first_visited_date is None:
                 first_visited_date = row['DATE']
              last_visited_date = row['DATE']
        elif row['TYPE'] == 'Ordered':
              print(f"User {name}: First visited date = {first_visited_date}, Last visited date = {last_visited_date}")

As date are already sorted sort() function is avoided.

Answered By: Hasan Seam