Pandas ewm correlation – not rolling

Question:

I have the following pandas dataframe:

             a   b   c
2023-01-01  35  34  17
2023-01-02  85  54  31
2023-01-03  33   8  27
2023-01-04  95   9  45
2023-01-05  71  98   7

I want to calculate today’s (2023-01-05) EWM correlations between the 3 series.

I tried

correls = data.ewm(alpha=0.01, adjust=True).corr(method='pearson')

and it produced rolling correlations (calculated on all dates):

                     a         b         c
2023-01-01 a       NaN       NaN       NaN
           b       NaN       NaN       NaN
           c       NaN       NaN       NaN
2023-01-02 a  1.000000  1.000000  1.000000
           b  1.000000  1.000000  1.000000
           c  1.000000  1.000000  1.000000
2023-01-03 a  1.000000  0.845674  0.694635
           b  0.845674  1.000000  0.203512
           c  0.694635  0.203512  1.000000
2023-01-04 a  1.000000  0.177224  0.842738
           b  0.177224  1.000000 -0.362909
           c  0.842738 -0.362909  1.000000
2023-01-05 a  1.000000  0.209568  0.478477
           b  0.209568  1.000000 -0.748170
           c  0.478477 -0.748170  1.000000

I know I can now slice the correls dataframe to get only the latest correlations. The problem is the real "data" dataframe is very large and computing rolling correlations takes a lot of time. Since I only need today’s correlations, how can I avoid EWN.corr function calculating rolling correlations in the first place?

To be clear, I’m looking for a fast way to get the following output:

          a         b         c
a  1.000000  0.209568  0.478477
b  0.209568  1.000000 -0.748170
c  0.478477 -0.748170  1.000000

Thanks

Asked By: younggotti

||

Answers:

IIUC to compute the EWMA, you need past data but you needn’t all values because oldest values are not really significant. I think you need about 300 days of history to compute a good correlation.

Why 300 days ? The coefficients are computed with wt = (1 - alpha)^t so the 300th have a weight of 0.0495 in the moving average:

fig, ax1 = plt.subplots(figsize=(8, 6))
w = (1 - 0.01)**np.arange(365*2)
cs = np.cumsum(w)
ax1.plot(w, label='Coefficient')
ax2 = ax1.twinx()
ax2.plot(cs, 'k', label='Cumulative sum')
ax1.set_title(r'Weights for $alpha=0.01$')
ax1.set_ylabel('Weight in EWMA')
ax1.set_xlabel('Days')
ax1.axvline(0, c='r', ls='--', lw=0.5)
ax1.axhline(0.05, c='g', ls='--', lw=0.5)
ax1.axvline(300, c='g', ls='--', lw=0.5)
ax1.text(300, 0.08, 'Weights > 0.05')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()

enter image description here

# For hist=300 days
>>> w[300], cs[300]
(0.04904089407128572, 95.14495148694262)

# For hist=500 days, a better accuracy and a cumsum ~= 100)
>>> w[500], cs[500]
(0.006570483042414603, 99.34952217880091)

How to use it:

hist = 300
df.iloc[hist:].ewm(alpha=0.5, adjust=True).corr().loc[df.index[-1]]

Note: more your alpha is small, more your need a larger history.

Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.