Year to Date Returns in Pandas DataFrame

Question:

I’d like to have a running year to date pct change column in my pandas dataframe:

Here is the dataframe:

            dollar
Date    
2015-01-01  97264.15
2015-02-01  102849.06
2015-03-01  101660.56
2015-04-01  102286.16
2015-05-01  103613.20
... ...
2020-12-01  197212.20
2021-01-01  196553.61
2021-02-01  202724.09
2021-03-01  210113.78
2021-04-01  220696.22

I can get a dataframe with year ending values and run pct_change on the dataframe:

df = df.groupby(pd.Grouper(level='Date', freq='A')).nth(-1)
df['Year'] = df['dollar'].pct_change(1)

But what I’d like is to have the monthly dataframe with a running YTD column.

Update: This gets me close…..

dfGrouped = df.groupby(pd.Grouper(level = 'Date', freq='A'))
df['YTD'] = dfGrouped['dollar'].transform(lambda x: x/x.iloc[0]-1.0)

            dollar      YTD
Date        
2020-12-01  197212.20   0.231018
2021-01-01  196553.61   0.000000
2021-02-01  202724.09   0.031393
2021-03-01  210113.78   0.068990
2021-04-01  220696.22   0.122830

But it is ‘off’ by 1 month. For example, the April 2021 YTD value is using the Jan 2021 value for the calculation instead of Dec 2020.

Thanks. Any help is greatly appreciated.

Nina

Asked By: Wheelingit

||

Answers:

If I understand you well, you want the running percent change with respect to the last value of the previous year. It’s maybe not the most elegant, but you can explicitly build this last-value-of-previous-year series.

To start, you build a series with the date indices and years as values:

>>> df.index.to_series().dt.year.rename('year')
Date
2015-01-01    2015
2015-02-01    2015
2015-03-01    2015
2015-04-01    2015
2015-05-01    2015
2020-12-01    2020
2021-01-01    2021
2021-02-01    2021
2021-03-01    2021
2021-04-01    2021
Name: year, dtype: int64

Now we can pass this to the groupby which as a result will just have the years as index, not the latest date of that year*:

>>> last_per_year = df['dollar'].groupby(year).last()
>>> last_per_year
year
2015    103613.20
2020    197212.20
2021    220696.22
Name: dollar, dtype: float64

So to get the previous year’s value you only have to shift(),
and using year we can re-broadcast these values to the original shape:

>>> ref_dollar_yearly = year.map(last_per_year.shift())
>>> ref_dollar_yearly
Date
2015-01-01         NaN
2015-02-01         NaN
2015-03-01         NaN
2015-04-01         NaN
2015-05-01         NaN
2020-12-01    103613.2
2021-01-01    197212.2
2021-02-01    197212.2
2021-03-01    197212.2
2021-04-01    197212.2
Name: year, dtype: float64

Of course the first year (here 2015) has no reference value from the previous year. Maybe a some kind of join or merge could work instead of map (year.reset_index().merge(last_per_year.shift(), how='left', on='year').set_index('Date')['dollar'] − it’s uglier, but maybe faster if there are many years?)

You already know how to do the rest:

>>> df['YTD'] = df['dollar'] / ref_dollar_yearly - 1
>>> df
               dollar       YTD
Date                           
2015-01-01   97264.15       NaN
2015-02-01  102849.06       NaN
2015-03-01  101660.56       NaN
2015-04-01  102286.16       NaN
2015-05-01  103613.20       NaN
2020-12-01  197212.20  0.903350
2021-01-01  196553.61 -0.003339
2021-02-01  202724.09  0.027949
2021-03-01  210113.78  0.065420
2021-04-01  220696.22  0.119080

* Note that there is another subtlety here, in the case of missing years.

  • df['dollar'].groupby(year).last(), just as df['dollar'].groupby(year).nth(-1) does not return any value for missing years
  • df['dollar'].groupby(pd.Grouper(level='Date', freq='A')).last() returns nan for missing years

This is important since you want to divide by the previous year, in the small example here I’m dividing 2020’s results by a value from 2015. To avoid this, you’ll need to reindex the dataframe before shift()ing:

>>> last_per_year.reindex(pd.RangeIndex(last_per_year.index.min(), last_per_year.index.max() + 1)).shift()
2015         NaN
2016    103613.2
2017         NaN
2018         NaN
2019         NaN
2020         NaN
2021    197212.2
Name: dollar, dtype: float64
Answered By: Cimbali

Had the same problem. Managed to fix it with a one-liner.
Essentially, for each date of the original data frame df['dollar'].groupby(df.index.year).transform('first')provides the first value available for each year on the dataset.

import pandas as pd
import yfinance as yf
df=yf.download("SPY")

df['ytd_return'] = df['Close'] / df['Close'].groupby(df.index.year).transform('first')

Happy Coding!

Answered By: Daniel Gonçalves
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.