5f: Lake Huron Modeling

0. Preliminary

  library(forecast)
  source('https://nmimoto.github.io/R/TS-00.txt')

  D = read.csv("https://nmimoto.github.io/datasets/lake.csv")
  Lake  = ts(D[,1], start=1875, freq=1)

  plot(Lake, type="o")

  layout(matrix(c(1,2), 1, 2))
  acf(Lake)
  pacf(Lake)

  layout(1)

  Stationarity.tests(Lake)

## Warning in kpss.test(A): p-value smaller than printed p-value

##        KPSS  ADF    PP
## p-val: 0.01 0.24 0.025

Stationarity tests results are mixed.

1. Direct ARMA fit

If we treat the series as “stationary”, we can try to fit ARMA(p,q) series. We’ve done this in earlier slides.

  # find best ARMA(p,q) by AICc
  Fit1 = auto.arima(Lake, d=0, stepwise=FALSE, approximation=FALSE)
  Fit1

## Series: Lake 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.7665  0.3393  9.1290
## s.e.  0.0773  0.1123  0.3861
## 
## sigma^2 = 0.4784:  log likelihood = -101.09
## AIC=210.18   AICc=210.62   BIC=220.48

  Randomness.tests(Fit1$residuals)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##       BL15  BL20  BL25  ML15  ML20   JB    SD
## [1,] 0.963 0.952 0.934 0.567 0.641 0.89 0.684

Model 1 (ARIMA(1,0,1))

auto.arima() chooses ARMA(1,1) with mean via minimum AICc. With observation \(Y_t\), \[\begin{align*} Y_t &= \mu + X_t \\ X_t &= \phi_1 X_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1} \\ \epsilon_t &\sim WN(0, \sigma^2) \\ \end{align*}\]

  Fit1

## Series: Lake 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.7665  0.3393  9.1290
## s.e.  0.0773  0.1123  0.3861
## 
## sigma^2 = 0.4784:  log likelihood = -101.09
## AIC=210.18   AICc=210.62   BIC=220.48

2. Linear Trend + ARMA

If we treat the series as “non-stationary”, we can try to fit a linear downward trend with ARMA errors.

Force linear regression by xreg=time(Lake)

  Reg2 = lm(Lake~time(Lake))

  plot(Lake, type="o")
  abline(Reg2, col="red")         # overlay regression line in red

  plot(Reg2$residuals, type="o")      # plot of the residuals from regression

  Stationarity.tests(Reg2$residuals)

## Warning in kpss.test(A): p-value greater than printed p-value

##        KPSS  ADF    PP
## p-val:  0.1 0.24 0.025

  # fit the residuals with ARMA(p,q)
  Fit2 = auto.arima(Lake, d=0, xreg=time(Lake), stepwise=FALSE, approximation=FALSE)
  Fit2

## Series: Lake 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1     ma1  intercept     xreg
##       0.6682  0.3817    55.5443  -0.0242
## s.e.  0.0936  0.1136    18.0324   0.0094
## 
## sigma^2 = 0.4612:  log likelihood = -98.66
## AIC=207.33   AICc=207.98   BIC=220.2

  Randomness.tests(Fit2$resid)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.985 0.974 0.969 0.227 0.217 0.783 0.668

  Fit2

## Series: Lake 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1     ma1  intercept     xreg
##       0.6682  0.3817    55.5443  -0.0242
## s.e.  0.0936  0.1136    18.0324   0.0094
## 
## sigma^2 = 0.4612:  log likelihood = -98.66
## AIC=207.33   AICc=207.98   BIC=220.2

Model 2 (linear trend + ARMA)

With observations \(Y_t\), a linear trend was fit first, then the residuals were fitted with ARMA(1,1).

\[\begin{align*} Y_t &= a + bt + X_t \\ X_t &= \phi_1 X_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1} \\ \epsilon_t &\sim WN(0, \sigma^2) \end{align*}\]

  Fit2

## Series: Lake 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1     ma1  intercept     xreg
##       0.6682  0.3817    55.5443  -0.0242
## s.e.  0.0936  0.1136    18.0324   0.0094
## 
## sigma^2 = 0.4612:  log likelihood = -98.66
## AIC=207.33   AICc=207.98   BIC=220.2

3. ARIMA modeling

We treat the series as “non-stationary”, but this time we use Box-Jenkins differencing to stationarize the data.

Note that auto.arima() INCLUDES drift term by default.

Arima() DOES NOT include drift term by default when d=1.

To force drift term in Arima(), use include.drift=TRUE

  plot(diff(Lake), type='o')

  layout(matrix(1:2, 1, 2))
  acf(diff(Lake))
  pacf(diff(Lake))

  layout(1)

  Stationarity.tests(diff(Lake))       # See if diff(Lake) is stationary

## Warning in adf.test(A): p-value smaller than printed p-value

## Warning in pp.test(A): p-value smaller than printed p-value

## Warning in kpss.test(A): p-value greater than printed p-value

##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01

  # Fit the difference of Lake with ARMA
  Fit3 = auto.arima(Lake, d=1, stepwise=FALSE, approximation=FALSE)
  Fit3

## Series: Lake 
## ARIMA(1,1,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 = 0.4812:  log likelihood = -99.88
## AIC=207.76   AICc=208.2   BIC=218.02

  Randomness.tests(Fit3$residuals)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.979 0.956 0.942 0.524 0.495 0.577 0.677

  #-- Exactly same as above
  Fit3b = Arima(Lake, order=c(1,1,2)) 
  Fit3b

## Series: Lake 
## ARIMA(1,1,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 = 0.4812:  log likelihood = -99.88
## AIC=207.76   AICc=208.2   BIC=218.02

  #-- force drift term in the model 
  Fit3c = Arima(Lake, order=c(1,1,2), include.drift=TRUE)
  Fit3c

## Series: Lake 
## ARIMA(1,1,2) with drift 
## 
## Coefficients:
##          ar1      ma1      ma2    drift
##       0.6911  -0.6271  -0.3729  -0.0241
## s.e.  0.0956   0.1251   0.1157   0.0100
## 
## sigma^2 = 0.4663:  log likelihood = -99
## AIC=208.01   AICc=208.67   BIC=220.83

  Randomness.tests(Fit3c$residuals)

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.989 0.979 0.973 0.276 0.239 0.465 0.666

Model 3 (ARIMA(1,1,2))

With observations \(Y_t\), a difference was taken. The difference seems to be ARMA(1,2).

In other words, \(Y_t\) is ARIMA(1,1,2), meaning \[\begin{align*} Y_t &= M_t + Z_t \hspace{5mm} \mbox{ where } M_t \mbox{is RW w drift, } Z_t \mbox{ is stationary series.}\\ \bigtriangledown Y_t &= Y_t - Y_{t-1} = e_t + (Z_t-Z_{t-1}) = \delta + X_t \\ X_t &= \phi_1 X_{t-1} + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} \\ \epsilon_t &\sim WN(0,\sigma^2) \end{align*}\]

  Fit3c

## Series: Lake 
## ARIMA(1,1,2) with drift 
## 
## Coefficients:
##          ar1      ma1      ma2    drift
##       0.6911  -0.6271  -0.3729  -0.0241
## s.e.  0.0956   0.1251   0.1157   0.0100
## 
## sigma^2 = 0.4663:  log likelihood = -99
## AIC=208.01   AICc=208.67   BIC=220.83

4. Another ARIMA model

Let’s look at the difference between the observations:

  plot(diff(Lake), type='o')

  Randomness.tests(diff(Lake))

##   B-L test H0: the series is uncorrelated
##   M-L test H0: the square of the series is uncorrelated
##   J-B test H0: the series came from Normal distribution
##   SD         : Standard Deviation of the series

##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.176 0.175 0.132 0.172 0.117 0.384 0.737

  Fit4 = Arima(Lake, order=c(0,1,0))
  Fit4

## Series: Lake 
## ARIMA(0,1,0) 
## 
## sigma^2 = 0.5383:  log likelihood = -106.49
## AIC=214.98   AICc=215.02   BIC=217.54

  Fit4b = Arima(Lake, order=c(0,1,0), include.drift=TRUE)
  Fit4b

## Series: Lake 
## ARIMA(0,1,0) with drift 
## 
## Coefficients:
##         drift
##       -0.0198
## s.e.   0.0749
## 
## sigma^2 = 0.5435:  log likelihood = -106.45
## AIC=216.91   AICc=217.03   BIC=222.03

Model 4 (ARIMA(0,1,0))

Difference of \(Y_t\) seems to be Normal noise. That means \(Y_t\) is RW without drift.

In other words, \(Y_t\) is ARIMA(0,1,0), meaning \[\begin{align*} \bigtriangledown Y_t &= Y_t - Y_{t-1} = X_t \\ X_t &= \epsilon_t \\ \epsilon_t &\sim WN(0,\sigma^2) \end{align*}\]

    Fit4

## Series: Lake 
## ARIMA(0,1,0) 
## 
## sigma^2 = 0.5383:  log likelihood = -106.49
## AIC=214.98   AICc=215.02   BIC=217.54

5. Model Summary

  Fit1

## Series: Lake 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.7665  0.3393  9.1290
## s.e.  0.0773  0.1123  0.3861
## 
## sigma^2 = 0.4784:  log likelihood = -101.09
## AIC=210.18   AICc=210.62   BIC=220.48

  Fit2

## Series: Lake 
## Regression with ARIMA(1,0,1) errors 
## 
## Coefficients:
##          ar1     ma1  intercept     xreg
##       0.6682  0.3817    55.5443  -0.0242
## s.e.  0.0936  0.1136    18.0324   0.0094
## 
## sigma^2 = 0.4612:  log likelihood = -98.66
## AIC=207.33   AICc=207.98   BIC=220.2

  Fit3c

## Series: Lake 
## ARIMA(1,1,2) with drift 
## 
## Coefficients:
##          ar1      ma1      ma2    drift
##       0.6911  -0.6271  -0.3729  -0.0241
## s.e.  0.0956   0.1251   0.1157   0.0100
## 
## sigma^2 = 0.4663:  log likelihood = -99
## AIC=208.01   AICc=208.67   BIC=220.83

  Fit4

## Series: Lake 
## ARIMA(0,1,0) 
## 
## sigma^2 = 0.5383:  log likelihood = -106.49
## AIC=214.98   AICc=215.02   BIC=217.54

6. Conclusion

Model (1): ARIMA(1,0,1) with mean is fitting adequately. This suggests that Lake is stationary (at constant mean) without any transformation.

\[{\large Y_t \hspace{3mm} = \hspace{3mm} \mu + X_t, \hspace{10mm} X_t \mbox{ is ARMA(1,1)} }\]
Model (2): ARIMA(1,0,1) with linear trend is fitting adequately. Intercept and slope are significant. However, the regression residuals are deemed non-stationary by ADF test.

\[{\large Y_t \hspace{3mm} = \hspace{3mm} a + bt + X_t, \hspace{10mm} X_t \mbox{ is ARMA(1,1)} }\]
Model (3): ARIMA(1,1,2) with drift is fitting adequately. Drift is significant, and same as the slope in (b).

\[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = \delta + X_t, \hspace{10mm} X_t \mbox{ is ARMA(1,2)} }\]
Model (4): ARIMA(0,1,0) is fitting adequately. This suggests that Lake is RW w/o drift. AICc was higher than that of Model 3.

\[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t = \epsilon_t \hspace{10mm} \epsilon_t \mbox{ is ARMA(0,0) = WN} }\]
Model (3) is the best candidate.