Class Web Page



1. Modeling Procedure


0. Staionarity and Autocorrelation

  • We are starting with stationary data.
    If the data is not stationary, it needs to be transformed.
  • Assume that ACF and PACF showing some autocorrelation.

1. Check for mean

  • Is the mean of this process 0?
  • We usually de-mean, unless we have a reason to believe the true mean is equal to 0.
  • Use include.mean=TRUE/FALSE option for Arima(), use allowmean=TRUE/FALSE option for auto.arima().

2. Model Selection

  • Now we have ARMA(p,q) as a general blanket model.

3. Order selection

  • Sometimes, shapes of ACF, and PACF is informative.
  • Use auto.arima() to select best value for \(p\) and \(q\) using AICc(default), AIC, BIC, Log Likelihood.
  • use stepwise=FALSE and approximation=FALSE option.
  • For ARMA(p,q) model, option d=0 must be used.
  • check for significance for given model.
  • check for significance for givem model + 1.

4. Parameter Estimation and Significance Check

  • CSS - Maximum Likelihood method is default.
  • If MLE gives error, change initial value.

5. Residual Analysis (Randomness.tests() function)

  • Ljung-Box test needs to have large p-value
  • McLeod-Li test needs to have large p-value
  • Jaque-Bera test is checking for nomality of the residuals

6. Forecast

  • Rolling 1-step prediction (retrospective)
  • Static h-step prediction (prospective)

7. Model Diagnostics

  • Prediction power as expected?
  • When there is more than 1 candiate, is one model significantly better than the other? (prediction power, interpretability, simplicity)

2. Ex: Wave tank data

(Cowpartwait Ch6.5) The data in the file wave.dat are the surface height of water (mm), relative to the still water level, measured using a capacitance probe positioned at the centre of a wave tank. The continuous voltage signal from this capacitance probe was sampled every 0.1 second over a 39.6-second period. The objective is to fit a suitable ARMA(p, q) model that can be used to generate a realistic wave input to a mathematical model for an ocean-going tugboat in a computer simulation. The results of the computer simulation will be compared with tests using a physical model of the tugboat in the wave tank.

  D = read.csv("https://nmimoto.github.io/datasets/wave.csv", header=T)
  Wave = ts(D, start=c(1,1), freq=1)

  library(forecast)
  plot(Wave)

  source('https://nmimoto.github.io/R/TS-00.txt')

  Fit01 = auto.arima(Wave, d=0)
  Fit01
  Randomness.tests(Fit01$resid)


  #Turn Stepwise and Approximate
  Fit02 = auto.arima(Wave, d=0, stepwise=FALSE, approximation=FALSE)
  Fit02
  Randomness.tests(Fit02$resid)


  #
  Fit12 = Arima(Wave, order=c(2, 0, 3))
  Fit12
  Randomness.tests(Fit12$resid)


  #
  Fit13 = Arima(Wave, order=c(5, 0, 6))
  Fit13
  Randomness.tests(Fit13$resid)


  # Turn off Stepwise and Approximate and mean
  Fit03 = auto.arima(Wave, d=0, stepwise=FALSE, approximation=FALSE, allowmean=FALSE)
  Fit03
  Randomness.tests(Fit03$resid)


  # Same as auto.arima above
  Fit04 = Arima(Wave, order=c(2,0,3), include.mean=FALSE)
  Fit04

  # Increase p and q
  Fit05 = Arima(Wave, order=c(3,0,4), include.mean=FALSE)
  Fit05
  Randomness.tests(Fit05$resid)


  # Increase more
  Fit05 = Arima(Wave, order=c(4,0,5), include.mean=FALSE)
  Fit05
  Randomness.tests(Fit05$resid)


  # Remove MA5
  Fit06 = Arima(Wave, order=c(4,0,4), include.mean=FALSE)
  Fit06
  Randomness.tests(Fit06$resid)


  # Increase p to 5
  Fit07 = Arima(Wave, order=c(5,0,4), include.mean=FALSE)
  Fit07
  Randomness.tests(Fit07$resid)


  # Increase p to 6
  Fit08 = Arima(Wave, order=c(6,0,4), include.mean=FALSE)
  Fit08
  Randomness.tests(Fit08$resid)

  # Increase p to 7
  Fit09 = Arima(Wave, order=c(7,0,4), include.mean=FALSE)
  Fit09
  Randomness.tests(Fit09$resid)

  # Increase p to 8
  Fit10 = Arima(Wave, order=c(8,0,4), include.mean=FALSE)
  Fit10
  Randomness.tests(Fit10$resid)

  # Increase q to 5
  Fit10 = Arima(Wave, order=c(8,0,5), include.mean=FALSE)
  Fit10
  Randomness.tests(Fit10$resid)


  #- Simulating Waves ---
  X = arima.sim( n=400, list(ar=Fit01$coef[1:3], ma=Fit01$coef[4:8]),
                  sd=sqrt(Fit01$sigma2) )  +  Fit01$coef[9]
  ts.plot(X,Wave, col=c("black","blue"), main="Actual Wave vs Simulated Wave")



3. How to turn off middle parameter

Not relevant to the wave data, but it’s possible to keep ARMA(3,5) but turn off \(\theta_2\) and \(\theta_4\).

  D = read.csv("https://nmimoto.github.io/datasets/wave.csv")
  Wave = ts(D, start=c(1,1), freq=1)

  library(forecast)
  plot(Wave)

  #Fit ARMA(3,5) with theta2 and theta4 =0  (Phis, Thetas, Mean).
  Fit2 = Arima(Wave, order=c(3,0,5), fixed=c(NA,NA,NA,  NA,0,NA,0,NA,  NA ) )
  Fit2
## Series: Wave 
## ARIMA(3,0,5) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     ar3      ma1  ma2     ma3  ma4      ma5
##       1.8743  -1.395  0.3101  -1.6210    0  0.9117    0  -0.2695
## s.e.  0.0629   0.092  0.0571   0.0343    0  0.0650    0   0.0369
##          mean
##       -4.9952
## s.e.   0.7309
## 
## sigma^2 = 19933:  log likelihood = -2521.46
## AIC=5058.93   AICc=5059.3   BIC=5090.78