pandas calculate returns batween two dates for multiple data points

Question

I have a dataframe with the following columns:

Date	Identifier	Price
28/02/2023	BBA LIBOR USD 3 MONTH	55
31/01/2023	BBA LIBOR USD 3 MONTH	63
28/02/2023	BBA LIBOR USD 1 Month	32
31/01/2023	BBA LIBOR USD 1 Month	59
28/02/2023	MSCI All Country World Index Net Total Return	16
31/01/2023	MSCI All Country World Index Net Total Return	17
28/02/2023	MSCI World Index Net Total Return	46
31/01/2023	MSCI World Index Net Total Return	12
28/02/2023	S&P500 Total Return Index	11
31/01/2023	S&P500 Total Return Index	45

I would like to calculate the percentage return from January to February by (Price_Feb/Price_Jan) – 1 and the collapsing to only keep February. This is the end dataframe I’d like to end up with:

Date	Identifier	Price	Returns
28/02/2023	BBA LIBOR USD 3 MONTH	55	-15.38%
28/02/2023	BBA LIBOR USD 1 Month	32	-45.76%
28/02/2023	MSCI All Country World Index Net Total Return	16	-5.88%
28/02/2023	MSCI World Index Net Total Return	46	283.33%
28/02/2023	S&P500 Total Return Index	11	-75.56%

So far I have tried this:

(df
    .sort_values(by=['Identifier', 'Date'], ascending=[True, False])
    .groupby(by='Identifier')
    .Price
    .pct_change()
)

This kinda works but it places the return on the 31/01/2023 date for all series.

Any ideas appreciated!

Asked By: Kronivar

||

Source

Answer 1

Not exactly sure what the issue is with the solution you have, but I added couple more steps to give you the dataframe format you are looking for:

df = pd.DataFrame(columns=['Date', 'Identifier', 'Price'])

df['Date'] = ['28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023']
df['Identifier'] = ['BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 1 Month', 'BBA LIBOR USD 1 Month', 'MSCI All Country World Index Net Total Return',
                    'MSCI All Country World Index Net Total Return', 'MSCI World Index Net Total Return', 'MSCI World Index Net Total Return']
df['Price'] = [55,63,32,59,16,17,46,12]

group_series = (df
 .sort_values(by=['Identifier', 'Date'], ascending=[True, False])
 .groupby(by='Identifier')
 .Price
 .pct_change()
)

df['pct_change'] = group_series
new_df = df.loc[~pd.isna(df['pct_change']), :]
new_df

Answered By: P. Shroff

Answer 2

My percentage change differed from your output

   df = pd.DataFrame(columns=['Date', 'Identifier', 'Price'])
df['Date'] = ['28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023']
df['Identifier'] = ['BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 1 Month', 'BBA LIBOR USD 1 Month', 'MSCI All Country World Index Net Total Return',
                    'MSCI All Country World Index Net Total Return', 'MSCI World Index Net Total Return', 'MSCI World Index Net Total Return']
df['Price'] = [55,63,32,59,16,17,46,12]

df['Date'] = pd.to_datetime(df['Date'])
df['Price'] = df['Price'].astype(float)
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year

df=df.sort_values(by=['Identifier', 'Date'], ascending=[True, True])
df['Price_Change'] = df.groupby(['Identifier'])['Price'].pct_change()*100
min_max=df.groupby(['Identifier'])['Price'].agg(['first', 'last'])
#print(min_max)

#print(df)

df = df.loc[~pd.isna(df['Price_Change']), :]
df=pd.merge(df,min_max,on=['Identifier'],how="inner")

print(df)

output:

 Date                                     Identifier  Price  Month  
0 2023-02-28                          BBA LIBOR USD 1 Month   32.0      2   
1 2023-02-28                          BBA LIBOR USD 3 MONTH   55.0      2   
2 2023-02-28  MSCI All Country World Index Net Total Return   16.0      2   
3 2023-02-28              MSCI World Index Net Total Return   46.0      2   

   Year  Price_Change  first  last  
0  2023    -45.762712   59.0  32.0  
1  2023    -12.698413   63.0  55.0  
2  2023     -5.882353   17.0  16.0  
3  2023    283.333333   12.0  46.0

Answered By: Golden Lion

pandas calculate returns batween two dates for multiple data points

Question:

Answers: