return last date and value each month in pandas

Question:

I have a df in pandas with daily data. I would like to return the last value of each month. I thought the simple solution would be to .resample("M").apply(lambda ser: ser.iloc[-1,]) however, it seems as if resample actually computes the month end date rather than return the actual date that appears that month. Is this intended behavior? MWE:

import pandas as pd
import numpy as np
df = pd.Series(np.arange(100), index=pd.date_range(start="2000-01-02", periods=100)).to_frame()
df.sort_index().resample("M").apply(lambda ser: ser.iloc[-1,])
#             0
#2000-01-31  29
#2000-02-29  58
#2000-03-31  89
#2000-04-30  99

While the last date appearing in df is 2000-04-10

Asked By: Alex

||

Answers:

You may need look at groupby + tail

df.groupby(df.index.month).tail(1)
Out[18]: 
             0
2000-01-31  29
2000-02-29  58
2000-03-31  89
2000-04-10  99
Answered By: BENY

By using resample with the offset M, you are downsampling your sample to calendar month’s end (see the linked documentation on offsets), and then passing a function. So your indices will always be the last day of that month, and this is indeed the intended behaviour. The function you are applying (lambda ser: ser.iloc[-1,]) is just saying: for the calendar date that ends on this day, what is the last value found in the original data.

For an example, you could also resample to month start, using the offset MS instead of M, and the result will be the same, except that the index will be the first day of the calendar month instead of the last day:

# Resample to month end, as you had originally:
>>> df.sort_index().resample("M").apply(lambda ser: ser.iloc[-1,])
             0
2000-01-31  29
2000-02-29  58
2000-03-31  89
2000-04-30  99

# Resample to month start: same data, except index is month start instead of month end
>>> df.sort_index().resample("MS").apply(lambda ser: ser.iloc[-1,])
             0
2000-01-01  29
2000-02-01  58
2000-03-01  89
2000-04-01  99

As pointed out by Wen, if you just want to show the actual last date found in your data, you’re better off with a groupby. Resample is useful if you want to up- or down- sample your data to a different time frequency, not so much to select real data from your original time frequency

Answered By: sacuL

As of 2023, you can use the following.

df.groupby([df.index.dt.year, df.index.dt.month])

If you only group by the month, you will get the last value of each month out of all the years in the data.

Answered By: BoomBoxBoy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.