How to forecast time series using AutoReg in python

Question:

I’m trying to build old school model using only auto regression algorithm. I found out that there’s an implementation of it in statsmodel package. I’ve read the documentation, and as I understand it should work as ARIMA. So, here’s my code:

import statsmodels.api as sm
model = sm.tsa.AutoReg(df_train.beer, 12).fit()

And when I want to predict new values, I’m trying to follow the documentation:

y_pred = model.predict(start=df_test.index.min(), end=df_test.index.max())
# or
y_pred = model.predict(start=100, end=1000)

Both returns a list of NaNs.

Also, when I type model.predict(0, df_train.size - 1) it predicts real values, but model.predict(0, df_train.size) predicts NaNs list.

Am I doing something wrong?


P.S. I know there’s ARIMA, ARMA or SARIMAX algorithms, that can be used as basic auto regression. But I need exactly AutoReg.

Asked By: Yoskutik

||

Answers:

You can use this code for forecasting

import statsmodels as sm

model = sm.tsa.AutoReg(df_train.beer, 12).fit()
y_pred = model.model.predict(model.params, start=df_test.index.min(), end=df_test.index.max())
Answered By: Ivan Adanenko

We can do the forecasting in couple of ways:

  1. by directly using the predict() function and
  2. by using the definition of AR(p) process and the parameters learnt with AutoReg(): this will be helpful for short-term predictions, as we shall see.

enter image description here

Let’s start with a sample dataset from statsmodels, the data looks like the following:

import statsmodels.api as sm
data = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY']
plt.plot(range(len(data)), data)

enter image description here

Let’s fit an AR(p) process to model the time series and use partial autocorrelation plot to find the order p, as shown below

sm.graphics.tsa.plot_pacf(data, lags=30, method="ywm")

As seen from above, the first few PACF values remain significant, let’s use p=10 for the AR(p).

Let’s divide the data into training and validation (test) datasets and fit auto-regressive model of order 10 using the training data:

from statsmodels.tsa.ar_model import AutoReg
n = len(data)
ntrain = int(n*0.9)
ntest = n - ntrain
lag = 10
res = AutoReg(data[:ntrain], lags = lag).fit()

Now, use the predict() function for forecasting all values corresponding to the held-out dataset:

preds = res.model.predict(res.params, start=n-ntest, end=n)

Notice that we can get the exactly same predictions using the parameters from the trained model, as shown below:

x = data[ntrain-lag:ntrain].values
preds1 = []
for t in range(ntrain, n):
    pred = res.params[0] + np.sum(res.params[1:]*x[::-1])
    x[:lag-1], x[lag-1] = x[-(lag-1):], pred
    preds1.append(pred)

Note that the forecast values generated this way is same as the ones obtained using the predict() function above.

np.allclose(preds.values, np.array(preds1))
# True

Now, let’s plot the forecast values for the test data:

enter image description here

As can be seen, for long term prediction, quality of forecasting is not that good (since the forecasted values are used for long term prediction).

Let’s instead go for short-term predictions now and use the last lag points from the dataset to forecast the next value, as shown in the next code snippet.

preds = []
for t in range(ntrain, n):
    pred = res.params[0] + np.sum(res.params[1:]*data[t-lag:t].values[::-1])
    preds.append(pred)

As can be seen from the next plot, short term forecasting works way better:

enter image description here

Answered By: Sandipan Dey
from statsmodels.tsa.ar_model import AutoReg

model=AutoReg(dataset[''],lags=1)
ARFit=model.fit()
forecasted=ARFit.predict(start=len(dataset),end=len(dataset)+12)

#visualizacion
dataset[''].plot(figsize=(12,8),legend=True)
forecasted.plot(legend=True)
Answered By: judith angélica