How to find the last row in a dataframe that contains a specific value in a specific column?
Question:
I am looking for a Python function that will allow me to retrieve the information in the ‘date’ column for the last row in my dataframe for each person in my dataframe. This is because I need to know the last date that each person in the dataframe entered data.
I have tried split the dataframe by person, then use the tail() function to find the information for all columns in the last row, then grab the date, however this does not work for a dataframe of a large size containing many people.
name score date
1 Mary 2 22-Feb-2022
2 Mary 1 16-Mar-2022
5 John 2 18-Dec-2022
6 Mary 3 01-Jan-2023
Answers:
A possible solution:
df.groupby('name')['date'].last()
Output:
name
John 2022-12-18
Mary 2023-01-01
Name: date, dtype: datetime64[ns]
If you want to add the last date to the dataframe:
df['last_date'] = df.groupby('name')['date'].transform('last')
Output:
name score date last_date
1 Mary 2 2022-02-22 2023-01-01
2 Mary 1 2022-03-16 2023-01-01
5 John 2 2022-12-18 2022-12-18
6 Mary 3 2023-01-01 2023-01-01
If you want the last iteration of each name, you can use drop_duplicates
:
# Assume your dataframe is already sorted by date
>>> df.drop_duplicates('name', keep='last')
name score date
5 John 2 18-Dec-2022
6 Mary 3 01-Jan-2023
I am looking for a Python function that will allow me to retrieve the information in the ‘date’ column for the last row in my dataframe for each person in my dataframe. This is because I need to know the last date that each person in the dataframe entered data.
I have tried split the dataframe by person, then use the tail() function to find the information for all columns in the last row, then grab the date, however this does not work for a dataframe of a large size containing many people.
name score date
1 Mary 2 22-Feb-2022
2 Mary 1 16-Mar-2022
5 John 2 18-Dec-2022
6 Mary 3 01-Jan-2023
A possible solution:
df.groupby('name')['date'].last()
Output:
name
John 2022-12-18
Mary 2023-01-01
Name: date, dtype: datetime64[ns]
If you want to add the last date to the dataframe:
df['last_date'] = df.groupby('name')['date'].transform('last')
Output:
name score date last_date
1 Mary 2 2022-02-22 2023-01-01
2 Mary 1 2022-03-16 2023-01-01
5 John 2 2022-12-18 2022-12-18
6 Mary 3 2023-01-01 2023-01-01
If you want the last iteration of each name, you can use drop_duplicates
:
# Assume your dataframe is already sorted by date
>>> df.drop_duplicates('name', keep='last')
name score date
5 John 2 18-Dec-2022
6 Mary 3 01-Jan-2023