How to forecast time series using AutoReg in python

Question

I’m trying to build old school model using only auto regression algorithm. I found out that there’s an implementation of it in statsmodel package. I’ve read the documentation, and as I understand it should work as ARIMA. So, here’s my code:

import statsmodels.api as sm
model = sm.tsa.AutoReg(df_train.beer, 12).fit()

And when I want to predict new values, I’m trying to follow the documentation:

y_pred = model.predict(start=df_test.index.min(), end=df_test.index.max())
# or
y_pred = model.predict(start=100, end=1000)

Both returns a list of NaNs.

Also, when I type model.predict(0, df_train.size - 1) it predicts real values, but model.predict(0, df_train.size) predicts NaNs list.

Am I doing something wrong?

P.S. I know there’s ARIMA, ARMA or SARIMAX algorithms, that can be used as basic auto regression. But I need exactly AutoReg.

Asked By: Yoskutik

||

Source

Answer 1

You can use this code for forecasting

import statsmodels as sm

model = sm.tsa.AutoReg(df_train.beer, 12).fit()
y_pred = model.model.predict(model.params, start=df_test.index.min(), end=df_test.index.max())

Answered By: Ivan Adanenko

Answer 2

We can do the forecasting in couple of ways:

by directly using the predict() function and
by using the definition of AR(p) process and the parameters learnt with AutoReg(): this will be helpful for short-term predictions, as we shall see.

Let’s start with a sample dataset from statsmodels, the data looks like the following:

import statsmodels.api as sm
data = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY']
plt.plot(range(len(data)), data)

Let’s fit an AR(p) process to model the time series and use partial autocorrelation plot to find the order p, as shown below

As seen from above, the first few PACF values remain significant, let’s use p=10 for the AR(p).

Let’s divide the data into training and validation (test) datasets and fit auto-regressive model of order 10 using the training data:

from statsmodels.tsa.ar_model import AutoReg
n = len(data)
ntrain = int(n*0.9)
ntest = n - ntrain
lag = 10
res = AutoReg(data[:ntrain], lags = lag).fit()

Now, use the predict() function for forecasting all values corresponding to the held-out dataset:

preds = res.model.predict(res.params, start=n-ntest, end=n)

Notice that we can get the exactly same predictions using the parameters from the trained model, as shown below:

x = data[ntrain-lag:ntrain].values
preds1 = []
for t in range(ntrain, n):
    pred = res.params[0] + np.sum(res.params[1:]*x[::-1])
    x[:lag-1], x[lag-1] = x[-(lag-1):], pred
    preds1.append(pred)

Note that the forecast values generated this way is same as the ones obtained using the predict() function above.

np.allclose(preds.values, np.array(preds1))
# True

Now, let’s plot the forecast values for the test data:

As can be seen, for long term prediction, quality of forecasting is not that good (since the forecasted values are used for long term prediction).

Let’s instead go for short-term predictions now and use the last lag points from the dataset to forecast the next value, as shown in the next code snippet.

preds = []
for t in range(ntrain, n):
    pred = res.params[0] + np.sum(res.params[1:]*data[t-lag:t].values[::-1])
    preds.append(pred)

As can be seen from the next plot, short term forecasting works way better:

Answered By: Sandipan Dey

Answer 3

from statsmodels.tsa.ar_model import AutoReg

model=AutoReg(dataset[''],lags=1)
ARFit=model.fit()
forecasted=ARFit.predict(start=len(dataset),end=len(dataset)+12)

#visualizacion
dataset[''].plot(figsize=(12,8),legend=True)
forecasted.plot(legend=True)

Answered By: judith angélica

How to forecast time series using AutoReg in python

Question:

Answers: