Pandas ewm correlation – not rolling
Question:
I have the following pandas dataframe:
a b c
2023-01-01 35 34 17
2023-01-02 85 54 31
2023-01-03 33 8 27
2023-01-04 95 9 45
2023-01-05 71 98 7
I want to calculate today’s (2023-01-05) EWM correlations between the 3 series.
I tried
correls = data.ewm(alpha=0.01, adjust=True).corr(method='pearson')
and it produced rolling correlations (calculated on all dates):
a b c
2023-01-01 a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
2023-01-02 a 1.000000 1.000000 1.000000
b 1.000000 1.000000 1.000000
c 1.000000 1.000000 1.000000
2023-01-03 a 1.000000 0.845674 0.694635
b 0.845674 1.000000 0.203512
c 0.694635 0.203512 1.000000
2023-01-04 a 1.000000 0.177224 0.842738
b 0.177224 1.000000 -0.362909
c 0.842738 -0.362909 1.000000
2023-01-05 a 1.000000 0.209568 0.478477
b 0.209568 1.000000 -0.748170
c 0.478477 -0.748170 1.000000
I know I can now slice the correls dataframe to get only the latest correlations. The problem is the real "data" dataframe is very large and computing rolling correlations takes a lot of time. Since I only need today’s correlations, how can I avoid EWN.corr function calculating rolling correlations in the first place?
To be clear, I’m looking for a fast way to get the following output:
a b c
a 1.000000 0.209568 0.478477
b 0.209568 1.000000 -0.748170
c 0.478477 -0.748170 1.000000
Thanks
Answers:
IIUC to compute the EWMA, you need past data but you needn’t all values because oldest values are not really significant. I think you need about 300 days of history to compute a good correlation.
Why 300 days ? The coefficients are computed with wt = (1 - alpha)^t
so the 300th have a weight of 0.0495 in the moving average:
fig, ax1 = plt.subplots(figsize=(8, 6))
w = (1 - 0.01)**np.arange(365*2)
cs = np.cumsum(w)
ax1.plot(w, label='Coefficient')
ax2 = ax1.twinx()
ax2.plot(cs, 'k', label='Cumulative sum')
ax1.set_title(r'Weights for $alpha=0.01$')
ax1.set_ylabel('Weight in EWMA')
ax1.set_xlabel('Days')
ax1.axvline(0, c='r', ls='--', lw=0.5)
ax1.axhline(0.05, c='g', ls='--', lw=0.5)
ax1.axvline(300, c='g', ls='--', lw=0.5)
ax1.text(300, 0.08, 'Weights > 0.05')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()
# For hist=300 days
>>> w[300], cs[300]
(0.04904089407128572, 95.14495148694262)
# For hist=500 days, a better accuracy and a cumsum ~= 100)
>>> w[500], cs[500]
(0.006570483042414603, 99.34952217880091)
How to use it:
hist = 300
df.iloc[hist:].ewm(alpha=0.5, adjust=True).corr().loc[df.index[-1]]
Note: more your alpha is small, more your need a larger history.
I have the following pandas dataframe:
a b c
2023-01-01 35 34 17
2023-01-02 85 54 31
2023-01-03 33 8 27
2023-01-04 95 9 45
2023-01-05 71 98 7
I want to calculate today’s (2023-01-05) EWM correlations between the 3 series.
I tried
correls = data.ewm(alpha=0.01, adjust=True).corr(method='pearson')
and it produced rolling correlations (calculated on all dates):
a b c
2023-01-01 a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
2023-01-02 a 1.000000 1.000000 1.000000
b 1.000000 1.000000 1.000000
c 1.000000 1.000000 1.000000
2023-01-03 a 1.000000 0.845674 0.694635
b 0.845674 1.000000 0.203512
c 0.694635 0.203512 1.000000
2023-01-04 a 1.000000 0.177224 0.842738
b 0.177224 1.000000 -0.362909
c 0.842738 -0.362909 1.000000
2023-01-05 a 1.000000 0.209568 0.478477
b 0.209568 1.000000 -0.748170
c 0.478477 -0.748170 1.000000
I know I can now slice the correls dataframe to get only the latest correlations. The problem is the real "data" dataframe is very large and computing rolling correlations takes a lot of time. Since I only need today’s correlations, how can I avoid EWN.corr function calculating rolling correlations in the first place?
To be clear, I’m looking for a fast way to get the following output:
a b c
a 1.000000 0.209568 0.478477
b 0.209568 1.000000 -0.748170
c 0.478477 -0.748170 1.000000
Thanks
IIUC to compute the EWMA, you need past data but you needn’t all values because oldest values are not really significant. I think you need about 300 days of history to compute a good correlation.
Why 300 days ? The coefficients are computed with wt = (1 - alpha)^t
so the 300th have a weight of 0.0495 in the moving average:
fig, ax1 = plt.subplots(figsize=(8, 6))
w = (1 - 0.01)**np.arange(365*2)
cs = np.cumsum(w)
ax1.plot(w, label='Coefficient')
ax2 = ax1.twinx()
ax2.plot(cs, 'k', label='Cumulative sum')
ax1.set_title(r'Weights for $alpha=0.01$')
ax1.set_ylabel('Weight in EWMA')
ax1.set_xlabel('Days')
ax1.axvline(0, c='r', ls='--', lw=0.5)
ax1.axhline(0.05, c='g', ls='--', lw=0.5)
ax1.axvline(300, c='g', ls='--', lw=0.5)
ax1.text(300, 0.08, 'Weights > 0.05')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()
# For hist=300 days
>>> w[300], cs[300]
(0.04904089407128572, 95.14495148694262)
# For hist=500 days, a better accuracy and a cumsum ~= 100)
>>> w[500], cs[500]
(0.006570483042414603, 99.34952217880091)
How to use it:
hist = 300
df.iloc[hist:].ewm(alpha=0.5, adjust=True).corr().loc[df.index[-1]]
Note: more your alpha is small, more your need a larger history.