How to resample including the last value of the previous resampled set?

Question:

I need to compute different measures of performance from prices at different time resolutions e.g., yearly or monthly. An ideal approach is to use Pandas’ resample but I also need a way to pass the last value of the previous resampling set to the next i.e. due to intro-resampling set return calculations. This is apparently not supported by doing resample('M'):

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(1, 61), index=pd.date_range('2021-12-31', '2022-02-28'), columns=['price'])
print(df)
df.resample('M').apply(lambda x: (x[0], x[-1]))

I get the following result:

            price
2021-12-31  (1, 1)
2022-01-31  (2, 32)
2022-02-28  (33, 60)

I would like to instead get the following where the first element of the resampling set includes the last element from the previous resampling set:

            price
2021-12-31  (1, 1)
2022-01-31  (1, 32)
2022-02-28  (32, 60)
Asked By: SkyWalker

||

Answers:

You cannot easily access the other groups from within a group.

But an easy way is to shift and concat the output:

d = df.resample('M').last()

pd.concat([d.shift().bfill(), d], axis=1).astype(int).apply(tuple, axis=1)

output:

2021-12-31      (1, 1)
2022-01-31     (1, 32)
2022-02-28    (32, 60)
Freq: M, dtype: object

Or if you have multiple columns:

(pd.concat([d.shift().bfill(), d], axis=1).astype(int)
   .groupby(level=0, axis=1, sort=False)
   .apply(lambda d: d.apply(tuple, axis=1))
 )

output:

               price
2021-12-31    (1, 1)
2022-01-31   (1, 32)
2022-02-28  (32, 60)

with another example:

df = pd.DataFrame({'price': np.arange(1, 61), 'other': np.arange(1, 61)*10},
                  index=pd.date_range('2021-12-31', '2022-02-28'),
                 )

s = df.resample('M').last()

(pd.concat([s.shift().bfill(), s], axis=1).astype(int)
   .groupby(level=0, axis=1, sort=False)
   .apply(lambda d: d.apply(tuple, axis=1))
 )

#                price       other
# 2021-12-31    (1, 1)    (10, 10)
# 2022-01-31   (1, 32)   (10, 320)
# 2022-02-28  (32, 60)  (320, 600)
Answered By: mozway
df.resample('M').apply(lambda x: 
                       (df.loc[max(df.index.min(),x.index.min()-pd.Timedelta(days=1)),'price'], x[-1]))
    
    2021-12-31      (1, 1)
    2022-01-31     (1, 32)
    2022-02-28    (32, 60)
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.