How to select user_id rows pandas
Question:
How can I calculate the first visited date and the last visited date before an order was placed by the user?
USER ID TYPE DATE
1 Visited September 14, 2020
1 Visited October 4, 2020
1 Visited October 24, 2020
1 Ordered November 1, 2020
2 Visited September 14, 2020
2 Visited October 1, 2020
3 Visited September 1, 2020
3 Visited October 4, 2020
3 Visited October 4, 2020
3 Visited October 19, 2020
3 Ordered January 1, 2021
3 Visited February 11, 2021
3 Visited February 24, 2021
3 Visited March 1, 2021
3 Ordered April 21, 2021
Expected Output:
USER ID Ordered MIN DATE MAX DATE
1 1 September 14, 2020 October 24, 2020
2 0 September 14, 2020 NAT
3 1 September 1, 2020 October 19, 2020
3 2 February 11, 2021 March 1, 2021
Answers:
Try:
df['DATE'] = pd.to_datetime(df['DATE'])
df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())
.set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']
.unstack('TYPE')
.groupby(['USER ID', 'grp'], sort=False)
.agg(Ordered=('Ordered','count'),
MIN_DATE=('Visited','first'),
MAX_DATE=('Visited','last'))
.reset_index('grp', drop=True)
.reset_index()
df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()
df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%B %d, %Y')
df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%B %d, %Y')
Output:
USER ID Ordered MIN_DATE MAX_DATE
0 1 1 September 14, 2020 October 24, 2020
1 2 0 September 14, 2020 NaN
2 3 1 September 01, 2020 October 19, 2020
3 3 2 February 11, 2021 March 01, 2021
To calculate the first visited date and the last visited date before an order was placed by the user, you can sort the data by the user ID and the date, and then use a loop to iterate through each row of the data. For each user, you can keep track of the first visited date and the last visited date, and update them as you iterate through the rows. When you encounter an order for a user, you can output the first and last visited dates for that user before the order was placed.
import pandas as pd
data = pd.DataFrame({
'USER ID': [1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'TYPE': ['Visited', 'Visited', 'Visited', 'Ordered', 'Visited', 'Visited',
'Visited', 'Visited', 'Visited', 'Visited', 'Ordered', 'Visited', 'Visited', 'Ordered'],
'DATE': ['September 14, 2020', 'October 4, 2020', 'October 24, 2020', 'November 1, 2020',
'September 14, 2020', 'October 1, 2020', 'September 1, 2020', 'October 4, 2020',
'October 4, 2020', 'October 19, 2020', 'January 1, 2021', 'February 11, 2021',
'February 24, 2021', 'April 21, 2021']})
data['DATE'] = pd.to_datetime(data['DATE'])
grouped_data = data.groupby('USER ID')
for name, group in grouped_data:
first_visited_date = None
last_visited_date = None
for index, row in group.iterrows():
if row['TYPE'] == 'Visited':
if first_visited_date is None:
first_visited_date = row['DATE']
last_visited_date = row['DATE']
elif row['TYPE'] == 'Ordered':
print(f"User {name}: First visited date = {first_visited_date}, Last visited date = {last_visited_date}")
As date are already sorted sort() function is avoided.
How can I calculate the first visited date and the last visited date before an order was placed by the user?
USER ID TYPE DATE
1 Visited September 14, 2020
1 Visited October 4, 2020
1 Visited October 24, 2020
1 Ordered November 1, 2020
2 Visited September 14, 2020
2 Visited October 1, 2020
3 Visited September 1, 2020
3 Visited October 4, 2020
3 Visited October 4, 2020
3 Visited October 19, 2020
3 Ordered January 1, 2021
3 Visited February 11, 2021
3 Visited February 24, 2021
3 Visited March 1, 2021
3 Ordered April 21, 2021
Expected Output:
USER ID Ordered MIN DATE MAX DATE
1 1 September 14, 2020 October 24, 2020
2 0 September 14, 2020 NAT
3 1 September 1, 2020 October 19, 2020
3 2 February 11, 2021 March 1, 2021
Try:
df['DATE'] = pd.to_datetime(df['DATE'])
df_out = df.assign(grp=(df['TYPE'] == 'Ordered')[::-1].cumsum())
.set_index(['USER ID', 'grp', 'TYPE'], append=True)['DATE']
.unstack('TYPE')
.groupby(['USER ID', 'grp'], sort=False)
.agg(Ordered=('Ordered','count'),
MIN_DATE=('Visited','first'),
MAX_DATE=('Visited','last'))
.reset_index('grp', drop=True)
.reset_index()
df_out['MAX_DATE'] = df_out['MAX_DATE'].mask(df_out['Ordered'] == 0)
df_out['Ordered'] = df_out['Ordered'].groupby(df_out['USER ID']).cumsum()
df_out['MIN_DATE'] = df_out['MIN_DATE'].dt.strftime('%B %d, %Y')
df_out['MAX_DATE'] = df_out['MAX_DATE'].dt.strftime('%B %d, %Y')
Output:
USER ID Ordered MIN_DATE MAX_DATE
0 1 1 September 14, 2020 October 24, 2020
1 2 0 September 14, 2020 NaN
2 3 1 September 01, 2020 October 19, 2020
3 3 2 February 11, 2021 March 01, 2021
To calculate the first visited date and the last visited date before an order was placed by the user, you can sort the data by the user ID and the date, and then use a loop to iterate through each row of the data. For each user, you can keep track of the first visited date and the last visited date, and update them as you iterate through the rows. When you encounter an order for a user, you can output the first and last visited dates for that user before the order was placed.
import pandas as pd
data = pd.DataFrame({
'USER ID': [1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'TYPE': ['Visited', 'Visited', 'Visited', 'Ordered', 'Visited', 'Visited',
'Visited', 'Visited', 'Visited', 'Visited', 'Ordered', 'Visited', 'Visited', 'Ordered'],
'DATE': ['September 14, 2020', 'October 4, 2020', 'October 24, 2020', 'November 1, 2020',
'September 14, 2020', 'October 1, 2020', 'September 1, 2020', 'October 4, 2020',
'October 4, 2020', 'October 19, 2020', 'January 1, 2021', 'February 11, 2021',
'February 24, 2021', 'April 21, 2021']})
data['DATE'] = pd.to_datetime(data['DATE'])
grouped_data = data.groupby('USER ID')
for name, group in grouped_data:
first_visited_date = None
last_visited_date = None
for index, row in group.iterrows():
if row['TYPE'] == 'Visited':
if first_visited_date is None:
first_visited_date = row['DATE']
last_visited_date = row['DATE']
elif row['TYPE'] == 'Ordered':
print(f"User {name}: First visited date = {first_visited_date}, Last visited date = {last_visited_date}")
As date are already sorted sort() function is avoided.