Fourier series columns don't appear in Deterministicprocess()
Question:
I have been refreshing my time-series skills and I’m having trouble with creating Fourier series. Here is the data (if you run everything together it will give you the same plots and final table):
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess
from sklearn.linear_model import LinearRegression
df = pd.DataFrame({'Pax': {Period('1949-01', 'M'): 112, Period('1949-02', 'M'): 118, Period('1949-03', 'M'): 132, Period('1949-04', 'M'): 129, Period('1949-05', 'M'): 121, Period('1949-06', 'M'): 135, Period('1949-07', 'M'): 148, Period('1949-08', 'M'): 148, Period('1949-09', 'M'): 136, Period('1949-10', 'M'): 119, Period('1949-11', 'M'): 104, Period('1949-12', 'M'): 118, Period('1950-01', 'M'): 115, Period('1950-02', 'M'): 126, Period('1950-03', 'M'): 141, Period('1950-04', 'M'): 135, Period('1950-05', 'M'): 125, Period('1950-06', 'M'): 149, Period('1950-07', 'M'): 170, Period('1950-08', 'M'): 170, Period('1950-09', 'M'): 158, Period('1950-10', 'M'): 133, Period('1950-11', 'M'): 114, Period('1950-12', 'M'): 140, Period('1951-01', 'M'): 145, Period('1951-02', 'M'): 150, Period('1951-03', 'M'): 178, Period('1951-04', 'M'): 163, Period('1951-05', 'M'): 172, Period('1951-06', 'M'): 178, Period('1951-07', 'M'): 199, Period('1951-08', 'M'): 199, Period('1951-09', 'M'): 184, Period('1951-10', 'M'): 162, Period('1951-11', 'M'): 146, Period('1951-12', 'M'): 166, Period('1952-01', 'M'): 171, Period('1952-02', 'M'): 180, Period('1952-03', 'M'): 193, Period('1952-04', 'M'): 181, Period('1952-05', 'M'): 183, Period('1952-06', 'M'): 218, Period('1952-07', 'M'): 230, Period('1952-08', 'M'): 242, Period('1952-09', 'M'): 209, Period('1952-10', 'M'): 191, Period('1952-11', 'M'): 172, Period('1952-12', 'M'): 194, Period('1953-01', 'M'): 196, Period('1953-02', 'M'): 196, Period('1953-03', 'M'): 236, Period('1953-04', 'M'): 235, Period('1953-05', 'M'): 229, Period('1953-06', 'M'): 243, Period('1953-07', 'M'): 264, Period('1953-08', 'M'): 272, Period('1953-09', 'M'): 237, Period('1953-10', 'M'): 211, Period('1953-11', 'M'): 180, Period('1953-12', 'M'): 201, Period('1954-01', 'M'): 204, Period('1954-02', 'M'): 188, Period('1954-03', 'M'): 235, Period('1954-04', 'M'): 227, Period('1954-05', 'M'): 234, Period('1954-06', 'M'): 264, Period('1954-07', 'M'): 302, Period('1954-08', 'M'): 293, Period('1954-09', 'M'): 259, Period('1954-10', 'M'): 229, Period('1954-11', 'M'): 203, Period('1954-12', 'M'): 229, Period('1955-01', 'M'): 242, Period('1955-02', 'M'): 233, Period('1955-03', 'M'): 267, Period('1955-04', 'M'): 269, Period('1955-05', 'M'): 270, Period('1955-06', 'M'): 315, Period('1955-07', 'M'): 364, Period('1955-08', 'M'): 347, Period('1955-09', 'M'): 312, Period('1955-10', 'M'): 274, Period('1955-11', 'M'): 237, Period('1955-12', 'M'): 278, Period('1956-01', 'M'): 284, Period('1956-02', 'M'): 277, Period('1956-03', 'M'): 317, Period('1956-04', 'M'): 313, Period('1956-05', 'M'): 318, Period('1956-06', 'M'): 374, Period('1956-07', 'M'): 413, Period('1956-08', 'M'): 405, Period('1956-09', 'M'): 355, Period('1956-10', 'M'): 306, Period('1956-11', 'M'): 271, Period('1956-12', 'M'): 306, Period('1957-01', 'M'): 315, Period('1957-02', 'M'): 301, Period('1957-03', 'M'): 356, Period('1957-04', 'M'): 348, Period('1957-05', 'M'): 355, Period('1957-06', 'M'): 422, Period('1957-07', 'M'): 465, Period('1957-08', 'M'): 467, Period('1957-09', 'M'): 404, Period('1957-10', 'M'): 347, Period('1957-11', 'M'): 305, Period('1957-12', 'M'): 336, Period('1958-01', 'M'): 340, Period('1958-02', 'M'): 318, Period('1958-03', 'M'): 362, Period('1958-04', 'M'): 348, Period('1958-05', 'M'): 363, Period('1958-06', 'M'): 435, Period('1958-07', 'M'): 491, Period('1958-08', 'M'): 505, Period('1958-09', 'M'): 404, Period('1958-10', 'M'): 359, Period('1958-11', 'M'): 310, Period('1958-12', 'M'): 337, Period('1959-01', 'M'): 360, Period('1959-02', 'M'): 342, Period('1959-03', 'M'): 406, Period('1959-04', 'M'): 396, Period('1959-05', 'M'): 420, Period('1959-06', 'M'): 472, Period('1959-07', 'M'): 548, Period('1959-08', 'M'): 559, Period('1959-09', 'M'): 463, Period('1959-10', 'M'): 407, Period('1959-11', 'M'): 362, Period('1959-12', 'M'): 405, Period('1960-01', 'M'): 417, Period('1960-02', 'M'): 391, Period('1960-03', 'M'): 419, Period('1960-04', 'M'): 461, Period('1960-05', 'M'): 472, Period('1960-06', 'M'): 535, Period('1960-07', 'M'): 622, Period('1960-08', 'M'): 606, Period('1960-09', 'M'): 508, Period('1960-10', 'M'): 461,Period('1960-11', 'M'): 390,Period('1960-12', 'M'): 432}})
df.head()
Where I create a constant and a trend:
dp = DeterministicProcess(
index=df.index,
constant=True,
order=1,
seasonal=False,
#additional_terms=[fourier],
drop=True,
)
X = dp.in_sample()
y = df.squeeze()
Which I fit with a linear regression, detrend the time-series, and plot the results:
model_pax = LinearRegression().fit(X, y)
y_pred_pax = pd.Series(model_pax.predict(X), index=X.index)
y_detrended = y-y_pred_pax
fig, (ax1, ax2) = plt.subplots(2,1, sharex=True, figsize=(10, 4))
ax1 = y.plot(label='Pax', ax=ax1)
ax1 = y_pred_pax.plot(label='trend', ax=ax1)
ax1.legend()
ax2 = y_detrended.plot(label='Pax detrended', ax=ax2)
ax2.legend()
plt.show()
Now I want to capture the seasonality, for this I need to do a fourier series. However when I create the deterministic process and include the fourier series, the fourier series columns don’t appear.
fourier = CalendarFourier(freq="M", order=4)
dp = DeterministicProcess(
index=y_detrended.index,
constant=True,
order=0,
seasonal=False,
additional_terms=[fourier]
drop=True,
)
dp.in_sample().head()
Only appears the constant without the fourier columns. Why? I have tried this with other datasets and works perfectly, and I don’t see any difference here. What am I missing here?
Answers:
I found the solution. I just had to change the M
from
CalendarFourier(freq="M", order=4)
To Y
:
CalendarFourier(freq="Y", order=4)
I can’t understand why or how it works specifically. It seems that the function CalendarFourier()
deduces if the input of the index argument is compatible with the frequency we are giving to the function. But I can’t be sure of this. Hope that someone finds a better explanation.
To further explain your own answer.
freq="M" means generating monthly fourier series, which means the series will repeat monthly.
freq="Y" means repeat yearly. So here, you clearly want to use the yearly repeat.
CalendarFourier(freq="M", order=4)
CalendarFourier(freq="Y", order=4)
I have been refreshing my time-series skills and I’m having trouble with creating Fourier series. Here is the data (if you run everything together it will give you the same plots and final table):
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess
from sklearn.linear_model import LinearRegression
df = pd.DataFrame({'Pax': {Period('1949-01', 'M'): 112, Period('1949-02', 'M'): 118, Period('1949-03', 'M'): 132, Period('1949-04', 'M'): 129, Period('1949-05', 'M'): 121, Period('1949-06', 'M'): 135, Period('1949-07', 'M'): 148, Period('1949-08', 'M'): 148, Period('1949-09', 'M'): 136, Period('1949-10', 'M'): 119, Period('1949-11', 'M'): 104, Period('1949-12', 'M'): 118, Period('1950-01', 'M'): 115, Period('1950-02', 'M'): 126, Period('1950-03', 'M'): 141, Period('1950-04', 'M'): 135, Period('1950-05', 'M'): 125, Period('1950-06', 'M'): 149, Period('1950-07', 'M'): 170, Period('1950-08', 'M'): 170, Period('1950-09', 'M'): 158, Period('1950-10', 'M'): 133, Period('1950-11', 'M'): 114, Period('1950-12', 'M'): 140, Period('1951-01', 'M'): 145, Period('1951-02', 'M'): 150, Period('1951-03', 'M'): 178, Period('1951-04', 'M'): 163, Period('1951-05', 'M'): 172, Period('1951-06', 'M'): 178, Period('1951-07', 'M'): 199, Period('1951-08', 'M'): 199, Period('1951-09', 'M'): 184, Period('1951-10', 'M'): 162, Period('1951-11', 'M'): 146, Period('1951-12', 'M'): 166, Period('1952-01', 'M'): 171, Period('1952-02', 'M'): 180, Period('1952-03', 'M'): 193, Period('1952-04', 'M'): 181, Period('1952-05', 'M'): 183, Period('1952-06', 'M'): 218, Period('1952-07', 'M'): 230, Period('1952-08', 'M'): 242, Period('1952-09', 'M'): 209, Period('1952-10', 'M'): 191, Period('1952-11', 'M'): 172, Period('1952-12', 'M'): 194, Period('1953-01', 'M'): 196, Period('1953-02', 'M'): 196, Period('1953-03', 'M'): 236, Period('1953-04', 'M'): 235, Period('1953-05', 'M'): 229, Period('1953-06', 'M'): 243, Period('1953-07', 'M'): 264, Period('1953-08', 'M'): 272, Period('1953-09', 'M'): 237, Period('1953-10', 'M'): 211, Period('1953-11', 'M'): 180, Period('1953-12', 'M'): 201, Period('1954-01', 'M'): 204, Period('1954-02', 'M'): 188, Period('1954-03', 'M'): 235, Period('1954-04', 'M'): 227, Period('1954-05', 'M'): 234, Period('1954-06', 'M'): 264, Period('1954-07', 'M'): 302, Period('1954-08', 'M'): 293, Period('1954-09', 'M'): 259, Period('1954-10', 'M'): 229, Period('1954-11', 'M'): 203, Period('1954-12', 'M'): 229, Period('1955-01', 'M'): 242, Period('1955-02', 'M'): 233, Period('1955-03', 'M'): 267, Period('1955-04', 'M'): 269, Period('1955-05', 'M'): 270, Period('1955-06', 'M'): 315, Period('1955-07', 'M'): 364, Period('1955-08', 'M'): 347, Period('1955-09', 'M'): 312, Period('1955-10', 'M'): 274, Period('1955-11', 'M'): 237, Period('1955-12', 'M'): 278, Period('1956-01', 'M'): 284, Period('1956-02', 'M'): 277, Period('1956-03', 'M'): 317, Period('1956-04', 'M'): 313, Period('1956-05', 'M'): 318, Period('1956-06', 'M'): 374, Period('1956-07', 'M'): 413, Period('1956-08', 'M'): 405, Period('1956-09', 'M'): 355, Period('1956-10', 'M'): 306, Period('1956-11', 'M'): 271, Period('1956-12', 'M'): 306, Period('1957-01', 'M'): 315, Period('1957-02', 'M'): 301, Period('1957-03', 'M'): 356, Period('1957-04', 'M'): 348, Period('1957-05', 'M'): 355, Period('1957-06', 'M'): 422, Period('1957-07', 'M'): 465, Period('1957-08', 'M'): 467, Period('1957-09', 'M'): 404, Period('1957-10', 'M'): 347, Period('1957-11', 'M'): 305, Period('1957-12', 'M'): 336, Period('1958-01', 'M'): 340, Period('1958-02', 'M'): 318, Period('1958-03', 'M'): 362, Period('1958-04', 'M'): 348, Period('1958-05', 'M'): 363, Period('1958-06', 'M'): 435, Period('1958-07', 'M'): 491, Period('1958-08', 'M'): 505, Period('1958-09', 'M'): 404, Period('1958-10', 'M'): 359, Period('1958-11', 'M'): 310, Period('1958-12', 'M'): 337, Period('1959-01', 'M'): 360, Period('1959-02', 'M'): 342, Period('1959-03', 'M'): 406, Period('1959-04', 'M'): 396, Period('1959-05', 'M'): 420, Period('1959-06', 'M'): 472, Period('1959-07', 'M'): 548, Period('1959-08', 'M'): 559, Period('1959-09', 'M'): 463, Period('1959-10', 'M'): 407, Period('1959-11', 'M'): 362, Period('1959-12', 'M'): 405, Period('1960-01', 'M'): 417, Period('1960-02', 'M'): 391, Period('1960-03', 'M'): 419, Period('1960-04', 'M'): 461, Period('1960-05', 'M'): 472, Period('1960-06', 'M'): 535, Period('1960-07', 'M'): 622, Period('1960-08', 'M'): 606, Period('1960-09', 'M'): 508, Period('1960-10', 'M'): 461,Period('1960-11', 'M'): 390,Period('1960-12', 'M'): 432}})
df.head()
Where I create a constant and a trend:
dp = DeterministicProcess(
index=df.index,
constant=True,
order=1,
seasonal=False,
#additional_terms=[fourier],
drop=True,
)
X = dp.in_sample()
y = df.squeeze()
Which I fit with a linear regression, detrend the time-series, and plot the results:
model_pax = LinearRegression().fit(X, y)
y_pred_pax = pd.Series(model_pax.predict(X), index=X.index)
y_detrended = y-y_pred_pax
fig, (ax1, ax2) = plt.subplots(2,1, sharex=True, figsize=(10, 4))
ax1 = y.plot(label='Pax', ax=ax1)
ax1 = y_pred_pax.plot(label='trend', ax=ax1)
ax1.legend()
ax2 = y_detrended.plot(label='Pax detrended', ax=ax2)
ax2.legend()
plt.show()
Now I want to capture the seasonality, for this I need to do a fourier series. However when I create the deterministic process and include the fourier series, the fourier series columns don’t appear.
fourier = CalendarFourier(freq="M", order=4)
dp = DeterministicProcess(
index=y_detrended.index,
constant=True,
order=0,
seasonal=False,
additional_terms=[fourier]
drop=True,
)
dp.in_sample().head()
Only appears the constant without the fourier columns. Why? I have tried this with other datasets and works perfectly, and I don’t see any difference here. What am I missing here?
I found the solution. I just had to change the M
from
CalendarFourier(freq="M", order=4)
To Y
:
CalendarFourier(freq="Y", order=4)
I can’t understand why or how it works specifically. It seems that the function CalendarFourier()
deduces if the input of the index argument is compatible with the frequency we are giving to the function. But I can’t be sure of this. Hope that someone finds a better explanation.
To further explain your own answer.
freq="M" means generating monthly fourier series, which means the series will repeat monthly.
freq="Y" means repeat yearly. So here, you clearly want to use the yearly repeat.
CalendarFourier(freq="M", order=4)
CalendarFourier(freq="Y", order=4)