pandas calculate returns batween two dates for multiple data points
Question:
I have a dataframe with the following columns:
Date
Identifier
Price
28/02/2023
BBA LIBOR USD 3 MONTH
55
31/01/2023
BBA LIBOR USD 3 MONTH
63
28/02/2023
BBA LIBOR USD 1 Month
32
31/01/2023
BBA LIBOR USD 1 Month
59
28/02/2023
MSCI All Country World Index Net Total Return
16
31/01/2023
MSCI All Country World Index Net Total Return
17
28/02/2023
MSCI World Index Net Total Return
46
31/01/2023
MSCI World Index Net Total Return
12
28/02/2023
S&P500 Total Return Index
11
31/01/2023
S&P500 Total Return Index
45
I would like to calculate the percentage return from January to February by (PriceFeb/PriceJan) – 1 and the collapsing to only keep February. This is the end dataframe I’d like to end up with:
Date
Identifier
Price
Returns
28/02/2023
BBA LIBOR USD 3 MONTH
55
-15.38%
28/02/2023
BBA LIBOR USD 1 Month
32
-45.76%
28/02/2023
MSCI All Country World Index Net Total Return
16
-5.88%
28/02/2023
MSCI World Index Net Total Return
46
283.33%
28/02/2023
S&P500 Total Return Index
11
-75.56%
So far I have tried this:
(df
.sort_values(by=['Identifier', 'Date'], ascending=[True, False])
.groupby(by='Identifier')
.Price
.pct_change()
)
This kinda works but it places the return on the 31/01/2023 date for all series.
Any ideas appreciated!
Answers:
Not exactly sure what the issue is with the solution you have, but I added couple more steps to give you the dataframe format you are looking for:
df = pd.DataFrame(columns=['Date', 'Identifier', 'Price'])
df['Date'] = ['28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023']
df['Identifier'] = ['BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 1 Month', 'BBA LIBOR USD 1 Month', 'MSCI All Country World Index Net Total Return',
'MSCI All Country World Index Net Total Return', 'MSCI World Index Net Total Return', 'MSCI World Index Net Total Return']
df['Price'] = [55,63,32,59,16,17,46,12]
group_series = (df
.sort_values(by=['Identifier', 'Date'], ascending=[True, False])
.groupby(by='Identifier')
.Price
.pct_change()
)
df['pct_change'] = group_series
new_df = df.loc[~pd.isna(df['pct_change']), :]
new_df
My percentage change differed from your output
df = pd.DataFrame(columns=['Date', 'Identifier', 'Price'])
df['Date'] = ['28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023']
df['Identifier'] = ['BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 1 Month', 'BBA LIBOR USD 1 Month', 'MSCI All Country World Index Net Total Return',
'MSCI All Country World Index Net Total Return', 'MSCI World Index Net Total Return', 'MSCI World Index Net Total Return']
df['Price'] = [55,63,32,59,16,17,46,12]
df['Date'] = pd.to_datetime(df['Date'])
df['Price'] = df['Price'].astype(float)
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year
df=df.sort_values(by=['Identifier', 'Date'], ascending=[True, True])
df['Price_Change'] = df.groupby(['Identifier'])['Price'].pct_change()*100
min_max=df.groupby(['Identifier'])['Price'].agg(['first', 'last'])
#print(min_max)
#print(df)
df = df.loc[~pd.isna(df['Price_Change']), :]
df=pd.merge(df,min_max,on=['Identifier'],how="inner")
print(df)
output:
Date Identifier Price Month
0 2023-02-28 BBA LIBOR USD 1 Month 32.0 2
1 2023-02-28 BBA LIBOR USD 3 MONTH 55.0 2
2 2023-02-28 MSCI All Country World Index Net Total Return 16.0 2
3 2023-02-28 MSCI World Index Net Total Return 46.0 2
Year Price_Change first last
0 2023 -45.762712 59.0 32.0
1 2023 -12.698413 63.0 55.0
2 2023 -5.882353 17.0 16.0
3 2023 283.333333 12.0 46.0
I have a dataframe with the following columns:
Date | Identifier | Price |
---|---|---|
28/02/2023 | BBA LIBOR USD 3 MONTH | 55 |
31/01/2023 | BBA LIBOR USD 3 MONTH | 63 |
28/02/2023 | BBA LIBOR USD 1 Month | 32 |
31/01/2023 | BBA LIBOR USD 1 Month | 59 |
28/02/2023 | MSCI All Country World Index Net Total Return | 16 |
31/01/2023 | MSCI All Country World Index Net Total Return | 17 |
28/02/2023 | MSCI World Index Net Total Return | 46 |
31/01/2023 | MSCI World Index Net Total Return | 12 |
28/02/2023 | S&P500 Total Return Index | 11 |
31/01/2023 | S&P500 Total Return Index | 45 |
I would like to calculate the percentage return from January to February by (PriceFeb/PriceJan) – 1 and the collapsing to only keep February. This is the end dataframe I’d like to end up with:
Date | Identifier | Price | Returns |
---|---|---|---|
28/02/2023 | BBA LIBOR USD 3 MONTH | 55 | -15.38% |
28/02/2023 | BBA LIBOR USD 1 Month | 32 | -45.76% |
28/02/2023 | MSCI All Country World Index Net Total Return | 16 | -5.88% |
28/02/2023 | MSCI World Index Net Total Return | 46 | 283.33% |
28/02/2023 | S&P500 Total Return Index | 11 | -75.56% |
So far I have tried this:
(df
.sort_values(by=['Identifier', 'Date'], ascending=[True, False])
.groupby(by='Identifier')
.Price
.pct_change()
)
This kinda works but it places the return on the 31/01/2023 date for all series.
Any ideas appreciated!
Not exactly sure what the issue is with the solution you have, but I added couple more steps to give you the dataframe format you are looking for:
df = pd.DataFrame(columns=['Date', 'Identifier', 'Price'])
df['Date'] = ['28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023']
df['Identifier'] = ['BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 1 Month', 'BBA LIBOR USD 1 Month', 'MSCI All Country World Index Net Total Return',
'MSCI All Country World Index Net Total Return', 'MSCI World Index Net Total Return', 'MSCI World Index Net Total Return']
df['Price'] = [55,63,32,59,16,17,46,12]
group_series = (df
.sort_values(by=['Identifier', 'Date'], ascending=[True, False])
.groupby(by='Identifier')
.Price
.pct_change()
)
df['pct_change'] = group_series
new_df = df.loc[~pd.isna(df['pct_change']), :]
new_df
My percentage change differed from your output
df = pd.DataFrame(columns=['Date', 'Identifier', 'Price'])
df['Date'] = ['28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023', '28/02/2023', '31/01/2023']
df['Identifier'] = ['BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 3 MONTH', 'BBA LIBOR USD 1 Month', 'BBA LIBOR USD 1 Month', 'MSCI All Country World Index Net Total Return',
'MSCI All Country World Index Net Total Return', 'MSCI World Index Net Total Return', 'MSCI World Index Net Total Return']
df['Price'] = [55,63,32,59,16,17,46,12]
df['Date'] = pd.to_datetime(df['Date'])
df['Price'] = df['Price'].astype(float)
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year
df=df.sort_values(by=['Identifier', 'Date'], ascending=[True, True])
df['Price_Change'] = df.groupby(['Identifier'])['Price'].pct_change()*100
min_max=df.groupby(['Identifier'])['Price'].agg(['first', 'last'])
#print(min_max)
#print(df)
df = df.loc[~pd.isna(df['Price_Change']), :]
df=pd.merge(df,min_max,on=['Identifier'],how="inner")
print(df)
output:
Date Identifier Price Month
0 2023-02-28 BBA LIBOR USD 1 Month 32.0 2
1 2023-02-28 BBA LIBOR USD 3 MONTH 55.0 2
2 2023-02-28 MSCI All Country World Index Net Total Return 16.0 2
3 2023-02-28 MSCI World Index Net Total Return 46.0 2
Year Price_Change first last
0 2023 -45.762712 59.0 32.0
1 2023 -12.698413 63.0 55.0
2 2023 -5.882353 17.0 16.0
3 2023 283.333333 12.0 46.0