library(forecast)
source('https://nmimoto.github.io/R/TS-00.txt')
= read.csv("https://nmimoto.github.io/datasets/lake.csv")
D = ts(D[,1], start=1875, freq=1)
Lake
plot(Lake, type="o")
layout(matrix(c(1,2), 1, 2))
acf(Lake)
pacf(Lake)
layout(1)
Stationarity.tests(Lake)
## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.24 0.025
Stationarity tests results are mixed.
If we treat the series as “stationary”, we can try to fit ARMA(p,q) series. We’ve done this in earlier slides.
# find best ARMA(p,q) by AICc
= auto.arima(Lake, d=0, stepwise=FALSE, approximation=FALSE)
Fit1 Fit1
## Series: Lake
## ARIMA(1,0,1) with non-zero mean
##
## Coefficients:
## ar1 ma1 mean
## 0.7665 0.3393 9.1290
## s.e. 0.0773 0.1123 0.3861
##
## sigma^2 = 0.4784: log likelihood = -101.09
## AIC=210.18 AICc=210.62 BIC=220.48
Randomness.tests(Fit1$residuals)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.963 0.952 0.934 0.567 0.641 0.89 0.684
auto.arima() chooses ARMA(1,1) with mean via minimum AICc. With observation \(Y_t\), \[\begin{align*} Y_t &= \mu + X_t \\ X_t &= \phi_1 X_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1} \\ \epsilon_t &\sim WN(0, \sigma^2) \\ \end{align*}\]
Fit1
## Series: Lake
## ARIMA(1,0,1) with non-zero mean
##
## Coefficients:
## ar1 ma1 mean
## 0.7665 0.3393 9.1290
## s.e. 0.0773 0.1123 0.3861
##
## sigma^2 = 0.4784: log likelihood = -101.09
## AIC=210.18 AICc=210.62 BIC=220.48
If we treat the series as “non-stationary”, we can try to fit a linear downward trend with ARMA errors.
Force linear regression by xreg=time(Lake)
= lm(Lake~time(Lake))
Reg2
plot(Lake, type="o")
abline(Reg2, col="red") # overlay regression line in red
plot(Reg2$residuals, type="o") # plot of the residuals from regression
Stationarity.tests(Reg2$residuals)
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.24 0.025
# fit the residuals with ARMA(p,q)
= auto.arima(Lake, d=0, xreg=time(Lake), stepwise=FALSE, approximation=FALSE)
Fit2 Fit2
## Series: Lake
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept xreg
## 0.6682 0.3817 55.5443 -0.0242
## s.e. 0.0936 0.1136 18.0324 0.0094
##
## sigma^2 = 0.4612: log likelihood = -98.66
## AIC=207.33 AICc=207.98 BIC=220.2
Randomness.tests(Fit2$resid)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.985 0.974 0.969 0.227 0.217 0.783 0.668
Fit2
## Series: Lake
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept xreg
## 0.6682 0.3817 55.5443 -0.0242
## s.e. 0.0936 0.1136 18.0324 0.0094
##
## sigma^2 = 0.4612: log likelihood = -98.66
## AIC=207.33 AICc=207.98 BIC=220.2
With observations \(Y_t\), a linear trend was fit first, then the residuals were fitted with ARMA(1,1).
\[\begin{align*} Y_t &= a + bt + X_t \\ X_t &= \phi_1 X_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1} \\ \epsilon_t &\sim WN(0, \sigma^2) \end{align*}\]
Fit2
## Series: Lake
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept xreg
## 0.6682 0.3817 55.5443 -0.0242
## s.e. 0.0936 0.1136 18.0324 0.0094
##
## sigma^2 = 0.4612: log likelihood = -98.66
## AIC=207.33 AICc=207.98 BIC=220.2
We treat the series as “non-stationary”, but this time we use Box-Jenkins differencing to stationarize the data.
Note that auto.arima() INCLUDES drift term by default.
Arima() DOES NOT include drift term by default when d=1.
To force drift term in Arima(), use include.drift=TRUE
plot(diff(Lake), type='o')
layout(matrix(1:2, 1, 2))
acf(diff(Lake))
pacf(diff(Lake))
layout(1)
Stationarity.tests(diff(Lake)) # See if diff(Lake) is stationary
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
# Fit the difference of Lake with ARMA
= auto.arima(Lake, d=1, stepwise=FALSE, approximation=FALSE)
Fit3 Fit3
## Series: Lake
## ARIMA(1,1,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.6385 -0.5349 -0.3514
## s.e. 0.1345 0.1445 0.1055
##
## sigma^2 = 0.4812: log likelihood = -99.88
## AIC=207.76 AICc=208.2 BIC=218.02
Randomness.tests(Fit3$residuals)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.979 0.956 0.942 0.524 0.495 0.577 0.677
#-- Exactly same as above
= Arima(Lake, order=c(1,1,2))
Fit3b Fit3b
## Series: Lake
## ARIMA(1,1,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.6385 -0.5349 -0.3514
## s.e. 0.1345 0.1445 0.1055
##
## sigma^2 = 0.4812: log likelihood = -99.88
## AIC=207.76 AICc=208.2 BIC=218.02
#-- force drift term in the model
= Arima(Lake, order=c(1,1,2), include.drift=TRUE)
Fit3c Fit3c
## Series: Lake
## ARIMA(1,1,2) with drift
##
## Coefficients:
## ar1 ma1 ma2 drift
## 0.6911 -0.6271 -0.3729 -0.0241
## s.e. 0.0956 0.1251 0.1157 0.0100
##
## sigma^2 = 0.4663: log likelihood = -99
## AIC=208.01 AICc=208.67 BIC=220.83
Randomness.tests(Fit3c$residuals)
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.989 0.979 0.973 0.276 0.239 0.465 0.666
With observations \(Y_t\), a difference was taken. The difference seems to be ARMA(1,2).
In other words, \(Y_t\) is ARIMA(1,1,2), meaning \[\begin{align*} Y_t &= M_t + Z_t \hspace{5mm} \mbox{ where } M_t \mbox{is RW w drift, } Z_t \mbox{ is stationary series.}\\ \bigtriangledown Y_t &= Y_t - Y_{t-1} = e_t + (Z_t-Z_{t-1}) = \delta + X_t \\ X_t &= \phi_1 X_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} \\ \epsilon_t &\sim WN(0,\sigma^2) \end{align*}\]
Fit3c
## Series: Lake
## ARIMA(1,1,2) with drift
##
## Coefficients:
## ar1 ma1 ma2 drift
## 0.6911 -0.6271 -0.3729 -0.0241
## s.e. 0.0956 0.1251 0.1157 0.0100
##
## sigma^2 = 0.4663: log likelihood = -99
## AIC=208.01 AICc=208.67 BIC=220.83
Let’s look at the difference between the observations:
plot(diff(Lake), type='o')
Randomness.tests(diff(Lake))
## B-L test H0: the series is uncorrelated
## M-L test H0: the square of the series is uncorrelated
## J-B test H0: the series came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.176 0.175 0.132 0.172 0.117 0.384 0.737
= Arima(Lake, order=c(0,1,0))
Fit4 Fit4
## Series: Lake
## ARIMA(0,1,0)
##
## sigma^2 = 0.5383: log likelihood = -106.49
## AIC=214.98 AICc=215.02 BIC=217.54
= Arima(Lake, order=c(0,1,0), include.drift=TRUE)
Fit4b Fit4b
## Series: Lake
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## -0.0198
## s.e. 0.0749
##
## sigma^2 = 0.5435: log likelihood = -106.45
## AIC=216.91 AICc=217.03 BIC=222.03
Difference of \(Y_t\) seems to be Normal noise. That means \(Y_t\) is RW without drift.
In other words, \(Y_t\) is ARIMA(0,1,0), meaning \[\begin{align*} \bigtriangledown Y_t &= Y_t - Y_{t-1} = X_t \\ X_t &= \epsilon_t \\ \epsilon_t &\sim WN(0,\sigma^2) \end{align*}\]
Fit4
## Series: Lake
## ARIMA(0,1,0)
##
## sigma^2 = 0.5383: log likelihood = -106.49
## AIC=214.98 AICc=215.02 BIC=217.54
Fit1
## Series: Lake
## ARIMA(1,0,1) with non-zero mean
##
## Coefficients:
## ar1 ma1 mean
## 0.7665 0.3393 9.1290
## s.e. 0.0773 0.1123 0.3861
##
## sigma^2 = 0.4784: log likelihood = -101.09
## AIC=210.18 AICc=210.62 BIC=220.48
Fit2
## Series: Lake
## Regression with ARIMA(1,0,1) errors
##
## Coefficients:
## ar1 ma1 intercept xreg
## 0.6682 0.3817 55.5443 -0.0242
## s.e. 0.0936 0.1136 18.0324 0.0094
##
## sigma^2 = 0.4612: log likelihood = -98.66
## AIC=207.33 AICc=207.98 BIC=220.2
Fit3c
## Series: Lake
## ARIMA(1,1,2) with drift
##
## Coefficients:
## ar1 ma1 ma2 drift
## 0.6911 -0.6271 -0.3729 -0.0241
## s.e. 0.0956 0.1251 0.1157 0.0100
##
## sigma^2 = 0.4663: log likelihood = -99
## AIC=208.01 AICc=208.67 BIC=220.83
Fit4
## Series: Lake
## ARIMA(0,1,0)
##
## sigma^2 = 0.5383: log likelihood = -106.49
## AIC=214.98 AICc=215.02 BIC=217.54
Model (1): ARIMA(1,0,1) with mean is fitting adequately. This suggests that Lake is stationary (at constant mean) without any transformation.
\[{\large Y_t \hspace{3mm} = \hspace{3mm} \mu + X_t, \hspace{10mm} X_t \mbox{ is ARMA(1,1)} }\]
Model (2): ARIMA(1,0,1) with linear trend is fitting adequately. Intercept and slope are significant. However, the regression residuals are deemed non-stationary by ADF test.
\[{\large Y_t \hspace{3mm} = \hspace{3mm} a + bt + X_t, \hspace{10mm} X_t \mbox{ is ARMA(1,1)} }\]
Model (3): ARIMA(1,1,2) with drift is fitting adequately. Drift is significant, and same as the slope in (b).
\[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = \delta + X_t, \hspace{10mm} X_t \mbox{ is ARMA(1,2)} }\]
Model (4): ARIMA(0,1,0) is fitting adequately. This suggests that Lake is RW w/o drift. AICc was higher than that of Model 3.
\[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t = \epsilon_t \hspace{10mm} \epsilon_t \mbox{ is ARMA(0,0) = WN} }\]
Model (3) is the best candidate.