w8c: ARMA(p,q) model

1. ARMA(p,q) Model

ARMA(1,1)

ARMA(1,1) model is defined as \[ X_t - \phi_1 X_{t-1} \hspace{3mm} = \hspace{3mm} \epsilon_t + \theta_1 \epsilon_{t-1} \] where \(\epsilon_t\sim WN(0,\sigma^2)\). Using the backward operator, this is same as \[ (1- \phi_1 B) \, X_t \hspace{3mm} = \hspace{3mm} (1 + \theta_1 B) \, \epsilon_t \] \[ {\large \Phi(B) \, X_t \hspace{3mm} = \hspace{3mm} \Theta(B) \, \epsilon_t } \]

ARMA(p,q)

Similarly, we can write
\[ X_t - \phi_1 X_{t-1} - \phi_1 X_{t-1} - \cdots - \phi_p X_{t-p} \hspace{3mm} = \hspace{3mm} \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q e_{t-q} \] \[ (1- \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p) \,\, X_t \hspace{3mm} = \hspace{3mm} (1 + \theta_1 B + \theta_2 B^2 + \cdots + \theta_p B^p) \,\, \epsilon_t \] \[ {\large \Phi(B) \,\, X_t \hspace{3mm} = \hspace{3mm} \Theta(B) \,\, \epsilon_t } \]

ARMA(3,2)

For example, ARMA(3,2) would look like \[ {\large X_t - \phi_1 X_{t-1} - \phi_2 X_{t-2} - \phi_3 X_{t-3} \hspace{3mm} = \hspace{3mm} \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} } \]

\[ {\large (1- \phi B - \phi_2 B^2 - \phi_3 B^3) \, X_t \hspace{3mm} = \hspace{3mm} (1 + \theta B + \theta_2 B^2) \, \epsilon_t } \]

\[ {\large \Phi(B) X_t \hspace{3mm} = \hspace{3mm} \Theta(B) \epsilon_t } \]

Causal Representation of ARMA(p,q)

If the AR characteristic polinomial \(\Phi(z)\) has all the root outside of the unit circle, we can represent ARMA(\(p,q\)) process as causal, \[ {\large X_t \hspace{3mm} = \hspace{3mm} \sum_{i=0}^\infty \psi_i \, e_{t-i} } \] with absolutely summable sequence \(\psi_i\). The process is stationary.

Invertible Representation of ARMA(p,q)

If the MA characteristic polinomial \(\Theta(z)\) has all the root outside of the unit circle, we can represent ARMA(\(p,q\)) process as invertible, \[ {\large \epsilon_t \hspace{3mm} = \hspace{3mm} \sum_{i=0}^\infty \pi_i \, X_{t-i} } \] with absolutely summable sequence \(\pi_i\).

We assume that all ARMA(\(p,q\)) are causal, and invetible. Again, nothing is lost in this assumption.

Theoretical ACF and ACVF of ARMA

Now both ACF and PACF will go down

  phis   = c(.4, .2)
  thetas = c(.6, .2)

  # sign is like [Brockwell]:  (1-phi B) X_t = (1+theta B) \epsilon_t
  Th.acf  = ARMAacf(ar=phis, ma=thetas, lag.max=20)           # Theoretical ACF
  Th.pacf = ARMAacf(ar=phis, ma=thetas, lag.max=20, pacf=T)   # Theoretical PACF


  #--- Basic Simulation with ARMA(p,q) ---
  mu = 5
  x  = arima.sim(n = 250, list(ar =phis,   ma = thetas )) + mu   # Simulate ARMA(2,2)

  plot(x, type="o");

  layout(matrix(1:2, 1, 2))
  acf(x);   lines(0:20, Th.acf,  type='p', col="red")      # sample ACF;  Theo ACF
  pacf(x);  lines(1:20, Th.pacf, type='p', col="red")      # sample PACF; Theo PACF

  layout(1)

2. Order Selection of ARMA(p,q) model

AICc

Bias-corrected version of AIC, suggested by Hurvich and Tsai (1989) \[ \mbox{AICC} = - 2 \log(\mbox{Max Likehood}) + \frac{2(k+1)(k+2)}{n-k-2} \] where \(k=p+q+1\) if non-zero mean is in the model, and \(k=p+q\) if no mean is in the model.

Compare this to AIC, which was designed to be an approximately unbiased estimate of the Kullback–Leibler index of the fitted model relative to the true model: \[ {\large \mbox{AIC} = - 2 \log(\mbox{Max Likehood}) + 2k } \]

See Brockwell 5.5.2 for more.

BIC

Bayesian Information Criteria \[ {\large \mbox{BIC} = - 2 \log(\mbox{Max Likehood}) + k \log(n) } \] where \(k=p+q+1\) if non-zero mean is in the model, and \(k=p+q\) if no mean is in the model.

Simulation Study: Is AICc better?

How much better is AICC over AIC? In what scenario?

Simulate ARMA(2,1) with non-zero mean

Pick ARMA(p,q) model based on AICc, AIC, BIC.

Repeat 1000 times, see how many times each criteria picked right \(p\).

  library(forecast)   #if not installed, do:  install.packages("forecast")


  # initialize the object that saves result
  Result1 = Result2 = Result3 = matrix(0, 1000, 7)
  Result4 = Result5 = Result6 = matrix(0, 1000, 7)
  for (i in 1:1000){
    Y = arima.sim(list(ar = c(.6, -.6), ma=c(.8) ), 100 ) + 10     # case 1
    #Y = arima.sim(list(ar = c(.6,  .3), ma=c(.5) ), 100 ) + 10      # case 2

    #- picks model based on AICC, AIC, and BIC
    Fit1 = auto.arima(Y, max.order=6, max.d=0, max.D=0,  ic=c("aicc"))
    Fit2 = auto.arima(Y, max.order=6, max.d=0, max.D=0,  ic=c("aic"))
    Fit3 = auto.arima(Y, max.order=6, max.d=0, max.D=0,  ic=c("bic"))

    #- all combo method (takes time)
    Fit4 = auto.arima(Y, max.order=6, max.d=0, max.D=0,  ic=c("aicc"), stepwise=FALSE)

    print(i)   #- print order on screen (optional)

    Result1[i,] = Fit1$arma   # checks if it picked ARMA(2,1) with mean
    Result2[i,] = Fit2$arma
    Result3[i,] = Fit3$arma
    Result4[i,] = Fit4$arma
  }

  R1 = apply( Result1, 1, function(x){ setequal(x, c(2,1,0,0,1,0,0)) } )
  R2 = apply( Result2, 1, function(x){ setequal(x, c(2,1,0,0,1,0,0)) } )
  R3 = apply( Result3, 1, function(x){ setequal(x, c(2,1,0,0,1,0,0)) } )
  R4 = apply( Result4, 1, function(x){ setequal(x, c(2,1,0,0,1,0,0)) } )
  c(mean(R1),  mean(R2),  mean(R3), mean(R4))

  # Results
  # AICc   AIC    BIC    AICc w All
  # 0.867  0.835  0.943  0.716        for Case 1  ar=(.6, -.6) ma=c(.8) mu=10
  # 0.352  0.374  0.172  0.293        for Case 2  ar=(.6,  .5) ma=c(.8) mu=10

In this two cases Stepwise method picked correct \((p,q)\) more frequently than All Combination method.

Still, use stepwise=FALSE when time allows.

Summary

For ARMA(p,q), ACF and PACF both tails off
auto.arima() uses AICc as default guide for selecting the best model.