cross correlation plot in statsmodels

Question:

Consider the simple example below, borrowed from How to use the ccf() method in the statsmodels library?

import pandas as pd
import numpy as np
import statsmodels.tsa.stattools as smt
import matplotlib.pyplot as plt

np.random.seed(123)
test = pd.DataFrame(np.random.randint(0,25,size=(79, 2)), columns=list('AB'))

I know how to create the forward and backward lags of the cross-correlation function (see SO link above) but the issue is how to obtain a proper dataframe containing the correct lag order. I came up with the solution below.

backwards = smt.ccf(test['A'][::-1], test['B'][::-1], adjusted=False)[::-1]

forwards = smt.ccf(test['A'], test['B'], adjusted=False)

#note how we skip the first lag (at 0) because we have a duplicate with the backward values otherwise
a = pd.DataFrame({'lag': range(1, len(forwards)),
              'value' : forwards[1:]})

b = pd.DataFrame({'lag':  [-i for i in list(range(0, len(forwards)))[::-1]],
              'value' : backwards})

full = pd.concat([a,b])
full.sort_values(by = 'lag', inplace = True)
full.set_index('lag').value.plot()

enter image description here

However, this seems to be a lot of code for something that that conceptually is very simple (just appending two lists). Can this code be streamlined?

Thanks!

Asked By: ℕʘʘḆḽḘ

||

Answers:

Well, you can try "just appending to lists":

# also
# cc = list(backards) + list(forwards[1:])
cc = np.concatenate([backwards, forwards[1:]])
full = pd.DataFrame({'lag':np.arange(len(cc))-len(backwards), 
                     'value':cc})
full.plot(x='lag')

Also:

full = (pd.DataFrame({'value':np.concatenate([backwards, forwards[1:]])})
          .assign(lag=lambda x: x.index - len(backwards) )
       )

Output:

enter image description here

Note if all you want is to plot the two arrays, then this would do

plt.plot(-np.arange(len(backwards)), backwards, c='C0')
plt.plot(forwards, c='C0')
Answered By: Quang Hoang

For Quang Hoang’s answer, I suggest to use np.arange(len(cc))-len(backwards)-1 because ccf returns the cross correlation coefficient starting from lag 0.

Answered By: Dongda Li
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.