# Pandas ewm correlation – not rolling

## Question:

I have the following pandas dataframe:

``````             a   b   c
2023-01-01  35  34  17
2023-01-02  85  54  31
2023-01-03  33   8  27
2023-01-04  95   9  45
2023-01-05  71  98   7
``````

I want to calculate today’s (2023-01-05) EWM correlations between the 3 series.

I tried

``````correls = data.ewm(alpha=0.01, adjust=True).corr(method='pearson')
``````

and it produced rolling correlations (calculated on all dates):

``````                     a         b         c
2023-01-01 a       NaN       NaN       NaN
b       NaN       NaN       NaN
c       NaN       NaN       NaN
2023-01-02 a  1.000000  1.000000  1.000000
b  1.000000  1.000000  1.000000
c  1.000000  1.000000  1.000000
2023-01-03 a  1.000000  0.845674  0.694635
b  0.845674  1.000000  0.203512
c  0.694635  0.203512  1.000000
2023-01-04 a  1.000000  0.177224  0.842738
b  0.177224  1.000000 -0.362909
c  0.842738 -0.362909  1.000000
2023-01-05 a  1.000000  0.209568  0.478477
b  0.209568  1.000000 -0.748170
c  0.478477 -0.748170  1.000000
``````

I know I can now slice the correls dataframe to get only the latest correlations. The problem is the real "data" dataframe is very large and computing rolling correlations takes a lot of time. Since I only need today’s correlations, how can I avoid EWN.corr function calculating rolling correlations in the first place?

To be clear, I’m looking for a fast way to get the following output:

``````          a         b         c
a  1.000000  0.209568  0.478477
b  0.209568  1.000000 -0.748170
c  0.478477 -0.748170  1.000000
``````

Thanks

IIUC to compute the EWMA, you need past data but you needn’t all values because oldest values are not really significant. I think you need about 300 days of history to compute a good correlation.

Why 300 days ? The coefficients are computed with `wt = (1 - alpha)^t` so the 300th have a weight of 0.0495 in the moving average:

``````fig, ax1 = plt.subplots(figsize=(8, 6))
w = (1 - 0.01)**np.arange(365*2)
cs = np.cumsum(w)
ax1.plot(w, label='Coefficient')
ax2 = ax1.twinx()
ax2.plot(cs, 'k', label='Cumulative sum')
ax1.set_title(r'Weights for \$alpha=0.01\$')
ax1.set_ylabel('Weight in EWMA')
ax1.set_xlabel('Days')
ax1.axvline(0, c='r', ls='--', lw=0.5)
ax1.axhline(0.05, c='g', ls='--', lw=0.5)
ax1.axvline(300, c='g', ls='--', lw=0.5)
ax1.text(300, 0.08, 'Weights > 0.05')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()
``````

``````# For hist=300 days
>>> w[300], cs[300]
(0.04904089407128572, 95.14495148694262)

# For hist=500 days, a better accuracy and a cumsum ~= 100)
>>> w[500], cs[500]
(0.006570483042414603, 99.34952217880091)
``````

How to use it:

``````hist = 300