Suppose \(Y_t\) is the observation. The Model says: \[ Y_t \hspace{3mm} = \hspace{3mm} m_t + S_t + X_t \]
\[\begin{align*} m_t &: \mbox{ Trend Component (linear, random)}\\ S_t &: \mbox{ Seasonal Component }\\ X_t &: \mbox{ Stationary Time Series (ARMA?)} \end{align*}\] where \(EX_t=0\).
Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]
Condition on Seasonal component: \(S_{t+s}=S_t\) and \(\sum_{j=1}^s S_j = 0\).
\(s\) is the seasonality frequency. (e.g. \(s=12\) for monthly seasonality).
Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]
Seasonality is not a trend. It’s a temporary deviation from overall
trend. \[
\sum_{j=1}^s S_j = 0.
\]
Accidental deaths in USA., 1973 to 1978 from Brockwell
= acf
acf1 library(TSA)
= acf1 # Load TSA package
acf
= read.csv("https://nmimoto.github.io/datasets/acci.csv", header=T)
D = ts(D, start=c(1973,1), freq=12) #- Turn D into ts object with frequency
Acci
plot(Acci, type='o', ylab="num of accidents")
Acci
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938
## 1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129
## 1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8433
## 1976 7717 7461 7776 7925 8634 8945 10078 9179 8037 8488
## 1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850
## 1978 7836 6892 7791 8129 9115 9434 10484 9827 9110 9070
## Nov Dec
## 1973 9161 8927
## 1974 8710 8680
## 1975 8160 8034
## 1976 7874 8647
## 1977 8265 8796
## 1978 8633 9240
#--- plot with month ---
plot(Acci, type="l", ylab="num of accidents")
points(y=Acci, x=time(Acci), pch=as.vector(season(Acci)))
Oil Filter Sales Data (Cryer p7) inside TSA package.
data(oilfilters) # load data inside TSA package
= oilfilters
Oil
is.ts(Oil) # is data in TS format?
## [1] TRUE
# look inside Oil
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1983 2385 3302 3958 3302 2441 3107
## 1984 5862 4536 4625 4492 4486 4005 3744 2546 1954 2285 1778 3222
## 1985 5472 5310 1965 3791 3622 3726 3370 2535 1572 2146 2249 1721
## 1986 5357 5811 2436 4608 2871 3349 2909 2324 1603 2148 2245 1586
## 1987 5332 5787 2886 5475 3843 2537
frequency(Oil) # what is the frequency?
## [1] 12
#--- plot with month ---
plot(Oil, type="l",ylab="Sales")
points(y=Oil, x=time(Oil), pch=as.vector(season(Oil)))
There are three poplular ways of removing seasonality:
MA filter
Seasonal Average
Seaonal Differencing
If \(s\) is odd, let it be \(2q+1\). Then use linear MA filter. \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{(2q+1)} \sum_{i=-q}^q Y_{t-j} \] If \(s\) is even, let it be \(2q\). Then use \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{2q} \Big(.5 Y_{t-q} + \sum_{i=-q-1}^{q-1} Y_{t-j} + .5 Y_{t+q}\Big) \]
Because seasonality should sum up to zero for each seasonal cycle, we have estimated trend: \[ Y_t = m_t + S_t + X_t \] \[ \hat m_t \hspace{3mm} = \hspace{3mm} \frac{1}{(2q+1)} \sum_{i=-q}^q m_{t-j} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q S_{t-j} }_{0} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q X_{t-j} }_{\mbox{ small }}. \] Then estimate for (\(S_t\) + \(Y_t\)) is \[ w_k \hspace{3mm} = \hspace{3mm} X_{t} - \hat m_{t} \hspace{10mm} q < t < n-q. \]
Use this to estimate the seasonal part, \[
\hat S_k \hspace{3mm} = \hspace{3mm} w_k - \frac{1}{s} \sum_{i=1}^s
w_i.
\] Now we have deseasonalized data (\(m_t\) + \(X_t\)) \[
d_t \hspace{3mm} = \hspace{3mm} Y_t - \hat S_t \hspace{10mm} t=1
\cdots n
\] Now go back and re-estimate trend using \(d_t\).
= filter(Acci, c(.5, rep(1,10), .5)/12)
m.hat plot(Acci, type="o")
lines(m.hat, col="red")
Filter reveals quadtaric trend
Suppose we have monthly seasonality, \(s=12\). Then take average for each month.
For example, average for January will be \[
\bar S_1 \hspace{3mm} = \hspace{3mm} \sum_{j=0} X_{1+12j}
\] and July average will be \[
\bar S_7 \hspace{3mm} = \hspace{3mm} \sum_{j=0} X_{7+12j}.
\] Note that this seasonal average will take out the trend part
as well.
plot(Acci, type="o")
#--- Take Monthly Averages
= aggregate(c(Acci), list(month=cycle(Acci)), mean)$x #- 1yr long Mtly Av
Mav1 = ts(Mav1[cycle(Acci)], start=start(Acci), freq=frequency(Acci)) #- Mtly Av
M.av1 = Acci-M.av1 #- Subtract from original
Acci.DS
plot(M.av1, type="o") # Monthly Average
plot(Acci, type="o", main="Acci (Blk) vs Mtly Av (Blue)")
lines(M.av1, col="blue")
plot(Acci.DS, type="o", main="De-seasonalized Acci (Blk - Blue)")
abline(h=0)
Quadratic trend can be found here too.
source('https://nmimoto.github.io/R/TS-00.txt')
Stationarity.tests(Acci.DS)
## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.885 0.397
Small-Large-Large means all of them points to non-stationarity.
= auto.arima(Acci.DS)
Fit1 Fit1
## Series: Acci.DS
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.4056
## s.e. 0.1082
##
## sigma^2 = 66457: log likelihood = -494.53
## AIC=993.07 AICc=993.24 BIC=997.59
Randomness.tests(Fit1$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.755 0.53 0.351 0.557 0.632 0.768 255.614
= auto.arima(Acci.DS, d=0)
Fit2 Fit2
## Series: Acci.DS
## ARIMA(1,0,1) with zero mean
##
## Coefficients:
## ar1 ma1
## 0.9471 -0.3656
## s.e. 0.0451 0.1205
##
## sigma^2 = 66208: log likelihood = -501.55
## AIC=1009.1 AICc=1009.46 BIC=1015.93
Randomness.tests(Fit2$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.597 0.31 0.231 0.823 0.799 0.926 255.441
After we subtracted the seasonality (Monthly average \(S_t\)), We either have ARIMA(0,1,1) or ARIMA(2,0,0).
\[\begin{align*}
Y_t &= \hspace{3mm} S_t + X_t \\\\
X_t &= \hspace{3mm} \phi_1 X_{t-1} + \phi_2 X_{t-2} + \epsilon_t
\hspace{10mm} \mbox{(if ARIMA(2,0,0))}
\end{align*}\]
(Box-Jenkins Method) For guessed seasonality \(s\), define \[
\bigtriangledown_s = (1-B^s)
\] (Don’t confuse this with \(\bigtriangledown^s\)). Then we have \[
\bigtriangledown_s X_t
\hspace{3mm} = \hspace{3mm} (1-B^s) X_t \hspace{3mm} =
\hspace{3mm} X_t - X_{t-s} \\\\
\hspace{3mm} = \hspace{3mm} m_t - m_{t-s} +
\underbrace{ S_t - S_{t-s} }_{0} + Y_t - Y_{t-s}.
\] since \(S_t=S_{t-s}\), this
eliminates the seasonality. Now fit ARIMA to \(\bigtriangledown_s X_t\).
plot(Acci, type='o', ylab="num of accidents")
plot(diff(Acci, 12)) # take seasonal difference (Del_12)
plot(diff(diff(Acci, 12))) # take (Del)(Del_12)
Stationarity.tests(diff(diff(Acci, 12)))
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
= auto.arima(diff(diff(Acci, 12)))
Fit3 Fit3
## Series: diff(diff(Acci, 12))
## ARIMA(0,0,1)(0,0,1)[12] with non-zero mean
##
## Coefficients:
## ma1 sma1 mean
## -0.4963 -0.6146 21.0220
## s.e. 0.1351 0.1932 11.9921
##
## sigma^2 = 98283: log likelihood = -424.27
## AIC=856.53 AICc=857.27 BIC=864.84
Randomness.tests(Fit3$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.588 0.457 0.453 0.385 0.497 0.355 307.622
We took \(\bigtriangledown_{12}\),
then \(\bigtriangledown\), then fit
MA(1). Therefore our model is \[
\bigtriangledown \bigtriangledown_{12} Y_t \hspace{3mm} =
\hspace{3mm} X_t \\
X_t \hspace{3mm} = \hspace{3mm} \epsilon_t + \theta_1 \epsilon_{t-1}
\] This is called \[
\mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace{3mm} \mbox{ model } \\ \\
\mbox{ sARIMA}(0,1,1) (0,1,0)_{12} \hspace{3mm} \mbox{ model }
\]
plot(Oil, type="o")
#--- Take Monthly Averages
= Oil
D0 = aggregate(c(D0), list(month=cycle(D0)), mean)$x #- 1yr long Mtly Av
Mav1 = ts(Mav1[cycle(D0)], start=start(D0), freq=frequency(D0))
M.av = D0-M.av #- Subtract from original
Ds.Oil
plot(M.av, type="o")
plot(Oil, type="o", main="Oil (Blk) vs Mtly Av (Blue)")
lines(M.av, col="blue")
plot(Ds.Oil, type="o", main="De-seasonalized Oil (Blk - Blue)")
abline(h=0)
Stationarity.tests(Ds.Oil)
## Warning in pp.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.025 0.039 0.01
= auto.arima(Ds.Oil, stepwise=FALSE, approximation=FALSE)
Fit3 Fit3
## Series: Ds.Oil
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.8626
## s.e. 0.0696
##
## sigma^2 = 378095: log likelihood = -368.67
## AIC=741.35 AICc=741.62 BIC=745.05
= auto.arima(Ds.Oil, d=0, stepwise=FALSE, approximation=FALSE)
Fit4 Fit4
## Series: Ds.Oil
## ARIMA(0,0,0) with zero mean
##
## sigma^2 = 379437: log likelihood = -376.42
## AIC=754.85 AICc=754.93 BIC=756.72
Randomness.tests(Fit4$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.518 0.525 0.241 0.818 0.804 0.173 622.503
After we subtracted the seasonality (Monthly average \(S_t\)), it was WN. \[\begin{align*}
Y_t &= \hspace{3mm} S_t + X_t \\\\
X_t &= \hspace{3mm} \epsilon_t
\end{align*}\]
plot(Oil)
plot(diff(Oil, 12)) # take seasonal difference (Del_12)
Stationarity.tests(diff(Oil, 12))
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.07 0.01 0.01
= auto.arima(diff(Oil, 12), stepwise=FALSE, approximation=FALSE)
Fit4 Fit4
## Series: diff(Oil, 12)
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## -217.3889
## s.e. 132.7486
##
## sigma^2 = 652522: log likelihood = -291.57
## AIC=587.14 AICc=587.5 BIC=590.31
Randomness.tests(Fit4$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.71 0.645 0.502 0.967 0.993 0.055 807.789
We took \(\bigtriangledown_{12}\),
then it looks like a WN. Therefore our model is \[
\bigtriangledown_{12} Y_t \hspace{3mm} = \hspace{3mm} X_t \\ \\
X_t \hspace{3mm} = \hspace{3mm} \epsilon_t
\] This is called \[
\mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace{3mm} \mbox{ model } \\ \\
\mbox{ sARIMA}(0,0,0) (0,1,0)_{12} \hspace{3mm} \mbox{ model }
\]
Seasonalilty may be eliminated by MA filter or Seaonal Average
Sesonal ARIMA means that you take seasonal difference before fitting ARIMA