Suppose \(s=12\), and your observation \(Y_t\) has linear trend, as well as the seasonality with period \(s=12\). \[ Y_t = a+b t + S_t + X_t. \] Then
\[\begin{align*} \bigtriangledown_s Y_{t} &= \hspace{3mm} Y_{t}-Y_{t-12}\\\\ &= \hspace{3mm} a+b (t) + S_{t} + X_{t} \\\\ & \hspace{10mm} - \Big( \hspace{3mm} a+b (t-12) \hspace{5mm} + S_{t-12} \hspace{5mm} + X_{t-12} \Big) \\\\ &= \hspace{3mm} b (12) + X_{t} - X_{t-12}\\ \end{align*}\]
This is statonary series with constant \(b(12)\). (called drift by
auto.arima())
If your observation \(Y_t\) has quadratic trend, as well as the seasonality. \[ Y_t = a+b t + c t^2 + S_t + X_t. \] Then, \[\begin{align*} \bigtriangledown_s Y_{t} &= \hspace{3mm} Y_{t}-Y_{t-12} \\\\ &= \hspace{3mm} a+b (t) + c(t)^2 + S_{t} + X_{t} \\\\ &\hspace{10mm} - \Big( \hspace{3mm} a+b (t-12) + c(t-12)^2 + S_{t-12} + X_{t-12} \Big) \\\\ &= \hspace{3mm} b (12) + c(24 t) + c(12^2) + X_{t} - X_{t-12}\\ \end{align*}\]
This is statonary series with linear trend \(b(12) + c(12^2) + c(24) t\).
If you take \(\bigtriangledown\) again, who will be left?
library(forecast)
source('https://nmimoto.github.io/R/TS-00.txt')
#--- Linear trend only
= 2+5*(1:200)
X
layout(matrix(1:2, 1,2))
plot(X, type="o")
plot(diff(X, 12), type="o")
#--- Quadratic trend only
= 2+3*(1:200)^2
X layout(matrix(1:3,1,3))
plot(X, type="o")
plot(diff(X, 12), type="o")
plot(diff(diff(X, 12)), type="o")
Suppose \(s=12\), and your observation \(Y_t\) has linear trend, \[ Y_t = a+b t + S_t + \epsilon_t \hspace{10mm} \mbox{ where } \epsilon_t \sim N(0,1) \]
Taking \(\bigtriangledown_{12}\) yields, \[\begin{align*} Y_{t+12}-Y_{t} &= \hspace{3mm} a+b (t) + S_{t} + e_{t} \\\\ &- ( a+b (t-12) \hspace{15mm} + S_{t-12} \hspace{5mm} + e_{t-12} ) \\\\ &= \hspace{3mm} \hspace{10mm} b(12) + e_{t} - e_{t-12} \end{align*}\]
What kind of process is this? \[ K_t \hspace{3mm} = \hspace{3mm} e_{t} - e_{t-12} \]
Linar regression with Seasonal average will be a better approach.
library(forecast)
source('https://nmimoto.github.io/R/TS-00.txt')
#--- Linear trend + WN
set.seed(3462)
= 2+.5*(1:200) + rnorm(200,0,5)
X
layout(matrix(1:2,1,2))
plot(X, type="o")
plot(diff(X, 12), type="o")
= ts(X, start=c(1,1), freq=12) # set freq=12
X
= auto.arima(X, D=1, stepwise=FALSE, approximation=FALSE)
Fit2 Fit2
## Series: X
## ARIMA(0,0,0)(2,1,0)[12] with drift
##
## Coefficients:
## sar1 sar2 drift
## -0.6780 -0.2089 0.5142
## s.e. 0.0719 0.0747 0.0181
##
## sigma^2 = 29.7: log likelihood = -586.82
## AIC=1181.63 AICc=1181.85 BIC=1194.58
Randomness.tests(Fit2$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.303 0.11 0.078 0.049 0.098 0.273 5.254
= Arima(X, order=c(0,0,0), seasonal=c(0,1,1))
Fit3 Fit3
## Series: X
## ARIMA(0,0,0)(0,1,1)[12]
##
## Coefficients:
## sma1
## 0.0767
## s.e. 0.0542
##
## sigma^2 = 82.46: log likelihood = -681.05
## AIC=1366.1 AICc=1366.17 BIC=1372.57
No sign of over-differencing
= Arima(X, order=c(0,0,0), seasonal=c(0,1,1), include.drift=TRUE)
Fit4 Fit4
## Series: X
## ARIMA(0,0,0)(0,1,1)[12] with drift
##
## Coefficients:
## sma1 drift
## -1.0000 0.5090
## s.e. 0.0933 0.0057
##
## sigma^2 = 21.79: log likelihood = -572.31
## AIC=1150.61 AICc=1150.74 BIC=1160.32
Now there’s clear sign of (seasonal) over-differencing.
Suppose \(s=12\), and your observation \(Y_t\) has random walk without drift as trend, together with the seasonality. \[ Y_t = W_t + S_t + X_t. \] Where \[ W_t \hspace{3mm} = \hspace{3mm} \sum_{i=1}^t e_i \hspace{10mm} e_i \sim_{iid} N(0,\sigma^2). \]
If you take seasonal difference, \[\begin{align*} \bigtriangledown_{12} Y_{t} &= \hspace{3mm} W_t \hspace{5mm} + S_t \hspace{5mm} + X_t \\ \\ &- \hspace{3mm} W_{t-12} - S_{t-12} - X_{t-12} \\ \\ &= \hspace{3mm} (W_t - W_{t-12}) + (X_t - X_{t-12}) \\ \end{align*}\]
The first part, \[ W_{t} - W_{t-12} \hspace{3mm} = \hspace{3mm} \sum_{i=0}^{t-1} e_{t-i} - \sum_{i=12}^{t-1} e_{t-i} \\ \hspace{3mm} = \hspace{3mm} \sum_{i=0}^{11} e_{t-i} \hspace{10mm} \sim N(0,12\sigma^2) \]
This is MA(11) with unit root. (non-invertible)
library(forecast)
source('https://nmimoto.github.io/R/TS-00.txt')
#--- RW without drift ---
set.seed(5351)
= cumsum(rnorm(200, 0, 1))
RW1 layout(matrix(1:2, 1,2))
plot(RW1, type="l")
plot(diff(RW1, 12), type="o")
= ts(RW1, start=c(1,1), freq=12)
RW2 auto.arima(RW2, D=1)
## Series: RW2
## ARIMA(1,0,0)(1,1,0)[12] with drift
##
## Coefficients:
## ar1 sar1 drift
## 0.9458 -0.5853 -0.0473
## s.e. 0.0224 0.0590 0.0751
##
## sigma^2 = 1.301: log likelihood = -293.38
## AIC=594.75 AICc=594.97 BIC=607.7
acf(diff(RW2, 12))
pacf(diff(RW2, 12))
Arima(RW2, order=c(0,0,12), seasonal=c(0,1,0))
## Series: RW2
## ARIMA(0,0,12)(0,1,0)[12]
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 0.9835 0.9878 1.0476 1.0295 1.0445 1.0440 1.0296 1.0460
## s.e. 0.0877 0.0926 0.1073 0.1188 0.1250 0.1207 0.1203 0.1103
## ma9 ma10 ma11 ma12
## 0.9855 0.9827 0.9643 -0.0351
## s.e. 0.1063 0.0963 0.1055 0.0823
##
## sigma^2 = 0.9189: log likelihood = -269.31
## AIC=564.62 AICc=566.71 BIC=606.69
Arima(RW2, order=c(0,0,12), seasonal=c(0,1,0), include.drift=TRUE)
## Series: RW2
## ARIMA(0,0,12)(0,1,0)[12] with drift
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 0.9804 0.9841 1.0436 1.0251 1.0400 1.0394 1.0251 1.0419
## s.e. 0.0877 0.0927 0.1070 0.1184 0.1244 0.1202 0.1198 0.1100
## ma9 ma10 ma11 ma12 drift
## 0.9816 0.9795 0.9610 -0.0382 -0.0471
## s.e. 0.1061 0.0964 0.1055 0.0826 0.0660
##
## sigma^2 = 0.9217: log likelihood = -269.06
## AIC=566.11 AICc=568.54 BIC=611.42
#--- RW with drift ---
set.seed(5351)
= ts(cumsum(rnorm(200, .5, 1)), start=c(1,1), freq=12)
RW.d plot(RW.d, type="l")
# Take seasonal difference
plot(diff(RW.d, 12), type="o")
auto.arima(RW.d, D=1) #- picks up the drift (.5)*(12)
## Series: RW.d
## ARIMA(1,0,0)(1,1,0)[12] with drift
##
## Coefficients:
## ar1 sar1 drift
## 0.9458 -0.5853 0.4527
## s.e. 0.0224 0.0590 0.0751
##
## sigma^2 = 1.301: log likelihood = -293.38
## AIC=594.75 AICc=594.97 BIC=607.7
Arima(RW.d, order=c(0,0,12), seasonal=c(0,1,0) )
## Series: RW.d
## ARIMA(0,0,12)(0,1,0)[12]
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 1.1366 1.1777 1.2771 1.2982 1.3251 1.3280 1.3023 1.2888
## s.e. 0.1262 0.0954 0.1498 0.1316 0.1749 0.1423 0.1709 0.1286
## ma9 ma10 ma11 ma12
## 1.1925 1.1437 1.1138 0.1110
## s.e. 0.1524 0.0976 0.1399 0.0733
##
## sigma^2 = 1.099: log likelihood = -286.3
## AIC=598.61 AICc=600.7 BIC=640.68
Arima(RW.d, order=c(0,0,12), seasonal=c(0,1,0), include.drift=TRUE)
## Series: RW.d
## ARIMA(0,0,12)(0,1,0)[12] with drift
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 0.9804 0.9841 1.0436 1.0251 1.0400 1.0395 1.0251 1.0419
## s.e. 0.0877 0.0927 0.1070 0.1184 0.1244 0.1202 0.1198 0.1100
## ma9 ma10 ma11 ma12 drift
## 0.9816 0.9795 0.9610 -0.0382 0.4529
## s.e. 0.1061 0.0964 0.1055 0.0826 0.0660
##
## sigma^2 = 0.9217: log likelihood = -269.06
## AIC=566.11 AICc=568.54 BIC=611.42
Suppose we have ARIMA\((1,0,1) \times (1,1,0)_{12}\) model, \[ (1-\phi_1 B) (1-\Phi_1 B^{12}) \bigtriangledown_{12} Y_t = (1-\theta_1 B) \epsilon_t \]
This is same as modeling \(\bigtriangledown_{12} Y_t\) with ARMA(13,1), \[ X_t \hspace{3mm} = \hspace{3mm} \bigtriangledown_{12} Y_t \\ \Big(1-\phi_1 B - \Phi_1 B^{12} + \phi_1 \Phi_1 B^{13} \Big) X_t \hspace{3mm} = \hspace{3mm} (1-\theta_1 B) \epsilon_t \]
We know how to get a prediction \(\hat
X(h)\) for ARMA(\(p,q\)), Our
predictor for \(Y_t\) will be \[\begin{align*}
Y_{n+1} &= \hspace{3mm} Y_{n+1-12} + \hat X(1) \\
Y_{n+2} &= \hspace{3mm} Y_{n+1-12} + \hat X(2) \\
&\vdots
\end{align*}\]
How do you check if you need to take a seasonal difference, instead of regular difference?
OCSB test (\(H_0\): Seasonal Unit Root Exists)
Default method in auto.arima()
Osborn-Chui-Smith-Birchenhall (1988)
CH test (\(H_0\): Deterministic Seasonality Exists)
Fore more detail: http://robjhyndman.com/hyndsight/forecast3/
Osborn-Chui-Smith-Birchenhall (1988) test for
Default choice in for choosing value of \(D\). \[ \left\{ \begin{array}{ll} H_0: \mbox{ Seasonal Unit Root Exists}\\ H_A: H_0 \mbox{ is false} \\ \end{array} \right. \]
Small p-value means “Take Seasonal Difference”.
set.seed(9823)
= 2+5*(1:200) + rnorm(200,0,5)
X = ts(X, start=1, freq=12)
X time(X)
## Jan Feb Mar Apr May Jun
## 1 1.000000 1.083333 1.166667 1.250000 1.333333 1.416667
## 2 2.000000 2.083333 2.166667 2.250000 2.333333 2.416667
## 3 3.000000 3.083333 3.166667 3.250000 3.333333 3.416667
## 4 4.000000 4.083333 4.166667 4.250000 4.333333 4.416667
## 5 5.000000 5.083333 5.166667 5.250000 5.333333 5.416667
## 6 6.000000 6.083333 6.166667 6.250000 6.333333 6.416667
## 7 7.000000 7.083333 7.166667 7.250000 7.333333 7.416667
## 8 8.000000 8.083333 8.166667 8.250000 8.333333 8.416667
## 9 9.000000 9.083333 9.166667 9.250000 9.333333 9.416667
## 10 10.000000 10.083333 10.166667 10.250000 10.333333 10.416667
## 11 11.000000 11.083333 11.166667 11.250000 11.333333 11.416667
## 12 12.000000 12.083333 12.166667 12.250000 12.333333 12.416667
## 13 13.000000 13.083333 13.166667 13.250000 13.333333 13.416667
## 14 14.000000 14.083333 14.166667 14.250000 14.333333 14.416667
## 15 15.000000 15.083333 15.166667 15.250000 15.333333 15.416667
## 16 16.000000 16.083333 16.166667 16.250000 16.333333 16.416667
## 17 17.000000 17.083333 17.166667 17.250000 17.333333 17.416667
## Jul Aug Sep Oct Nov Dec
## 1 1.500000 1.583333 1.666667 1.750000 1.833333 1.916667
## 2 2.500000 2.583333 2.666667 2.750000 2.833333 2.916667
## 3 3.500000 3.583333 3.666667 3.750000 3.833333 3.916667
## 4 4.500000 4.583333 4.666667 4.750000 4.833333 4.916667
## 5 5.500000 5.583333 5.666667 5.750000 5.833333 5.916667
## 6 6.500000 6.583333 6.666667 6.750000 6.833333 6.916667
## 7 7.500000 7.583333 7.666667 7.750000 7.833333 7.916667
## 8 8.500000 8.583333 8.666667 8.750000 8.833333 8.916667
## 9 9.500000 9.583333 9.666667 9.750000 9.833333 9.916667
## 10 10.500000 10.583333 10.666667 10.750000 10.833333 10.916667
## 11 11.500000 11.583333 11.666667 11.750000 11.833333 11.916667
## 12 12.500000 12.583333 12.666667 12.750000 12.833333 12.916667
## 13 13.500000 13.583333 13.666667 13.750000 13.833333 13.916667
## 14 14.500000 14.583333 14.666667 14.750000 14.833333 14.916667
## 15 15.500000 15.583333 15.666667 15.750000 15.833333 15.916667
## 16 16.500000 16.583333 16.666667 16.750000 16.833333 16.916667
## 17 17.500000 17.583333
plot(X)
lm(X ~ time(X)) # slope b is 60
##
## Call:
## lm(formula = X ~ time(X))
##
## Coefficients:
## (Intercept) time(X)
## -53.55 60.07
Arima(X, order=c(1,0,1), xreg=time(X)) # xreg tells you b is 60
## Series: X
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept xreg
## -0.6822 0.5770 -53.5502 60.0739
## s.e. 0.2507 0.2791 0.6324 0.0604
##
## sigma^2 = 19.61: log likelihood = -579.39
## AIC=1168.78 AICc=1169.09 BIC=1185.27
mean( diff(X, 12) ) # supposed to be b(12) it's actually b(12)(1/12)
## [1] 59.96646
Arima(X, order=c(2,0,0), seasonal=c(0,1,0), include.drift=TRUE)
## Series: X
## ARIMA(2,0,0)(0,1,0)[12] with drift
##
## Coefficients:
## ar1 ar2 drift
## -0.1085 0.1250 4.9972
## s.e. 0.0722 0.0721 0.0377
##
## sigma^2 = 37.86: log likelihood = -606.85
## AIC=1221.7 AICc=1221.92 BIC=1234.65
# They actually tell you b/12 instead of b.
If you take \(\bigtriangledown_{12}\) of Linear trend + Seasonality, it eliminates the seasonality, and leaves constant \(12 (slope)\).
But if you do the same with (Linear trend + Sesonality + WN), then it turns into 12(slope) + (non-invertible sMA(1)). Which is troublesome. If this is the case, force fit sMA(1) will produce \(\Theta_1\) that is close to 1.
If you have (Linear Trend + Seasonality + WN), then it should be modeled by regression and seasonal average, not by \(\bigtriangledown_{12}\).
If you take \(\bigtriangledown_{12}\) of (Random Trend + Seasonality), then it leaves 12(drift) + (non-invertible MA(11)). Force fit MA(11) to find out.