Find second most recent date in a dataframe column

Question:

I have data for example:

  1. Sampled_Date
  2. 8/29/2017
  3. 8/29/2017
  4. 8/29/2017
  5. 2/28/2016
  6. 2/28/2016
  7. 5/15/2014

Etc..
Now I can find max and min dates as

df.Sampled_Date.max()
df.Sampled_Date.min()

But how to find the second most recent date.
i.e 2/28/2016 in Python’s pandas data frame.

Asked By: MD Rijwan

||

Answers:

Make sure your dates are in datetime first:

df['Sampled_Date'] = pd.to_datetime(df['Sampled_Date'])

Then drop the duplicates, take the nlargest(2), and take the last value of that:

df['Sampled_Date'].drop_duplicates().nlargest(2).iloc[-1]

# Timestamp('2016-02-28 00:00:00')
Answered By: sacuL

You can also use .argsort()

import pandas as pd

# Generate dates
dates = pd.Series(pd.date_range(start='1/1/2017', periods=5, freq=pd.offsets.MonthEnd(3)))

# Random order
dates = dates.sample(frac=1, random_state=0)

# Get the second 'max' date
dates[dates.argsort() == (len(dates)-2)] # 3   2017-10-31
Answered By: Deena

I know this is an extension of the question, but it’s something I frequently need and sometimes forget, so I’m sharing here:

Let’s say instead of just wanting the second most recent or second earliest dates for an entire dataframe, you have a dataframe of users and dates, and you want to get the second earliest date for each user (e.g. their second transaction).

Example dataframe:

test = pd.DataFrame()
test['users'] = [1,2,3,2,3,2]
test['dates'] = pd.to_datetime(['2019-01-01','2019-01-01',
                                '2019-01-02','2019-01-02',
                                '2019-01-03','2019-01-04'])

The earliest date for user 2 is ‘2019-01-01′ and the second earliest date is ’20-19-01-02’. We can use groupby, apply, and nlargest/nsmallest:

test.groupby('users')['dates'].apply(lambda x: x.nsmallest(2).max())

which gives us this output:

users
1   2019-01-01
2   2019-01-02
3   2019-01-03
Name: dates, dtype: datetime64[ns]
Answered By: Paul Meinshausen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.